FactorMatte: Redefining Video Matting for Re-Composition Tasks

SIGGRAPH Journal 2023

Zeqi Gu^1,2, Wenqi Xian^1,2, Noah Snavely^1,2, Abe Davis²

¹Cornell Tech ²Cornell University

arXiv Code Data

FactorMatte creates a counterfactual world where objects become "invisible" .

Abstract

We propose factor matting, an alternative formulation of the video matting problem in terms of counterfactual video synthesis that is better suited for re-composition tasks. The goal of factor matting is to separate the contents of video into independent components, each visualizing a counterfactual version of the scene where contents of other components have been removed. We show that factor matting maps well to a more general Bayesian framing of the matting problem that accounts for complex conditional interactions between layers. Based on this observation, we present a method for solving the factor matting problem that produces useful decompositions even for video with complex cross-layer interactions like splashes, shadows, and reflections.

Our method is trained per-video and requires neither pre-training on external large datasets, nor knowledge about the 3D structure of the scene. We conduct extensive experiments, and show that our method not only can disentangle scenes with complex interactions, but also outperforms top methods on existing tasks such as classical video matting and background subtraction. In addition, we demonstrate the benefits of our approach on a range of downstream tasks.

Video

Video Layer Decomposition and Re-Composition

We reframe video matting in terms of counterfactual video synthesis for downstream re-compositing tasks, where each counterfactual video answers a questions of the form “what would this component look like if we froze time and separated it from the rest of the scene?” We developed a plug-in for Adobe After Effects for faster re-composition, and used it to produce results in the rightmost column.

Example 1: Truck Jump

Input video

Foreground

Background

Re-Composition

Example 2: Breakdance

Input video

Foreground

Background

Re-Composition

Example 3: Kite-Surf

Input video

Foreground

Background

Re-Composition

Result comparison with OmniMatte

We compare with the most related previous work, Omnimatte. While Omnimatte has the merits of associating effects such as shadows and reflections to the correct layer, they do not have explicit conditional priors for any layer, and thus fail at complex scenes with foreground-background interactions. They also do not produce meaningful color factorizations.

Example 1: Skier Landing

Input video

OmniMatte alpha

OmniMatte RGB

OmniMatte RGBa

Input mask

FactorMatte alpha

FactorMatte RGB

FactorMatte RGBa

Example 2: Hike

Input video

OmniMatte alpha

OmniMatte RGB

OmniMatte RGBa

Input mask

FactorMatte alpha

FactorMatte RGB

FactorMatte RGBa

Example 3: Kite-Surf

Input video

OmniMatte alpha

OmniMatte RGB

OmniMatte RGBa

Input mask

FactorMatte alpha

FactorMatte RGB

FactorMatte RGBa

Video Matting

While FactorMatte is designed to address videos featuring complex cross-component interactions, we find that it also excels on scenes without such interactions. One example would be the task of classical video matting.

Input

OmniMatte

BGM

FactorMatte

Background Substraction

Another example is the task of background subtraction. We select clips from CDW-2014 with shadows and reflections that should be associated with foreground objects, and featuring significant camera jitter, as well as changes in zoom and exposure.

Example 1: Traffic

Input video

FgSegNet-v2

OmniMatte

FactorMatte

Example 2: ZoomInZoomOut

Input video

FgSegNet-v2

OmniMatte

FactorMatte

Additional Video Editing Effects

The output of FactorMatte can also be combined with other methods for downstream applications, such as object removal and color pop. we compare the results of Flow-edge Guided Video Completion using different input masks provided by a variety of matting methods. Simple segmentation masks tend to leave correlated effects like shadows in the scene, while masks from Omnimatte lead to removal of most interaction effects, including deformations of the background. The mask from FactorMatte contains the object and its shadow but not the cushion, thus leading to the most plausible invisible result.

Improving the quality of alpha mattes also enables us to shift the color or timing of components within a video more aggressively than previous methods. In the flashlight video below, we successfully change the color of the flashlight beam by adjusting the foreground color layer to be more red outside of the input foreground mask.

Video Inpainting

Input video

Seg mask

OmniMatte

FactorMatte

Color Pop

Input video

OmniMatte

FactorMatte

BibTeX

@article{gu2023factormatte,
  title={FactorMatte: Redefining Video Matting for Re-Composition Tasks},
  author={Gu, Zeqi and Xian, Wenqi and Snavely, Noah and Davis, Abe},
  journal={ACM Transactions on Graphics (TOG)},
  volume={42},
  number={4},
  pages={1--14},
  year={2023},
  publisher={ACM New York, NY, USA}
}