FactorMatte: Redefining Video Matting for Re-Composition Tasks

1Cornell Tech   2Cornell University  

FactorMatte creates a counterfactual world where objects become "invisible" .



Abstract

We propose factor matting, an alternative formulation of the video matting problem in terms of counterfactual video synthesis that is better suited for re-composition tasks. The goal of factor matting is to separate the contents of video into independent components, each visualizing a counterfactual version of the scene where contents of other components have been removed. We show that factor matting maps well to a more general Bayesian framing of the matting problem that accounts for complex conditional interactions between layers. Based on this observation, we present a method for solving the factor matting problem that produces useful decompositions even for video with complex cross-layer interactions like splashes, shadows, and reflections.

Our method is trained per-video and requires neither pre-training on external large datasets, nor knowledge about the 3D structure of the scene. We conduct extensive experiments, and show that our method not only can disentangle scenes with complex interactions, but also outperforms top methods on existing tasks such as classical video matting and background subtraction. In addition, we demonstrate the benefits of our approach on a range of downstream tasks.

Video

Video Layer Decomposition and Re-Composition

We reframe video matting in terms of counterfactual video synthesis for downstream re-compositing tasks, where each counterfactual video answers a questions of the form “what would this component look like if we froze time and separated it from the rest of the scene?” We developed a plug-in for Adobe After Effects for faster re-composition, and used it to produce results in the rightmost column.

 

Example 1: Truck Jump

Input video
Foreground
Background
Re-Composition


Example 2: Breakdance

Input video
Foreground
Background
Re-Composition


Example 3: Kite-Surf

Input video
Foreground
Background
Re-Composition

Result comparison with OmniMatte

We compare with the most related previous work, Omnimatte. While Omnimatte has the merits of associating effects such as shadows and reflections to the correct layer, they do not have explicit conditional priors for any layer, and thus fail at complex scenes with foreground-background interactions. They also do not produce meaningful color factorizations.

 

Example 1: Skier Landing

Input video
OmniMatte alpha
OmniMatte RGB
OmniMatte RGBa
Input mask
FactorMatte alpha
FactorMatte RGB
FactorMatte RGBa


Example 2: Hike

Input video
OmniMatte alpha
OmniMatte RGB
OmniMatte RGBa
Input mask
FactorMatte alpha
FactorMatte RGB
FactorMatte RGBa


Example 3: Kite-Surf

Input video
OmniMatte alpha
OmniMatte RGB
OmniMatte RGBa
Input mask
FactorMatte alpha
FactorMatte RGB
FactorMatte RGBa

Video Matting

While FactorMatte is designed to address videos featuring complex cross-component interactions, we find that it also excels on scenes without such interactions. One example would be the task of classical video matting.

 

Input
OmniMatte
BGM
FactorMatte

Background Substraction

Another example is the task of background subtraction. We select clips from CDW-2014 with shadows and reflections that should be associated with foreground objects, and featuring significant camera jitter, as well as changes in zoom and exposure.

 

Example 1: Traffic

Input video
FgSegNet-v2
OmniMatte
FactorMatte


Example 2: ZoomInZoomOut

Input video
FgSegNet-v2
OmniMatte
FactorMatte

Additional Video Editing Effects

The output of FactorMatte can also be combined with other methods for downstream applications, such as object removal and color pop. we compare the results of Flow-edge Guided Video Completion using different input masks provided by a variety of matting methods. Simple segmentation masks tend to leave correlated effects like shadows in the scene, while masks from Omnimatte lead to removal of most interaction effects, including deformations of the background. The mask from FactorMatte contains the object and its shadow but not the cushion, thus leading to the most plausible invisible result.

 

Improving the quality of alpha mattes also enables us to shift the color or timing of components within a video more aggressively than previous methods. In the flashlight video below, we successfully change the color of the flashlight beam by adjusting the foreground color layer to be more red outside of the input foreground mask.

 

Video Inpainting

Input video
Seg mask
OmniMatte
FactorMatte

 

 

 

Color Pop

Input video
OmniMatte
FactorMatte