Omnimatte: Associating Objects and Their Effects

Supplementary Material

Main Project Page

 

 


Video Editing Effects Using Omnimattes

We demonstrate several editing effects using our omnimattes: stroboscopy and duplication, color pop, and background replacement.

"Horsejump-Low - Stroboscopy"


"Flamingo - Color Pop"

*Note: Subject and reflection are kept in color, while background is set to grayscale.

"Breakdance-flare - Background Replacement"

 


Omnimatte Results

For each result, we show the original video, input mask(s), output omnimatte(s) (alpha and RGBA), and output background.

 

"Dogwalk"


"Blackswan"


"Soccer"


"Tennis"


"Horsejump-low"


"Elephant"


"Lucia"


"Drift-chicane"


"Flamingo"


"S-busStation"


"Breakdance"

*Note: As discussed in Section 4.2, we use a single layer to capture the motion of the people in the crowd. Note that the static regions of the crowd are captured by the background layer, whereas the dynamic regions are captured by the crowd omnimatte layer.

"Breakdance-flare"


"Bear"


"Hike"

*Note: An example of a failure due to inaccurate camera stabilization (elements in the background are captured in the person's omnimatte to correct for stabilization errors).

"Camel"

*Note: Since the camel in the back is largely static, it is mostly captured by the background (due to our alpha regularization term); its head movements are captured by the corresponding omnimatte.

"Car-shadow"


"Cows"


"Dance-twirl"


"Dog-agility"


"Horsejump-high"


"Judo"


"Kite-walk"


"Mallard-fly"


"Mallard-water"


"Paragliding"


"Paragliding-launch"


"Rhino"


"Rollerblade"


"Skate-park"


"Tractor-sand"


 

 


Object Removal

We compare our method on object removal with the video completion method, FGVC [1].
For each example, we show the original video, the manual mask used by FGVC and their result, our fully automatic mask (from binarizing our omnimatte) and the FGVC result using our mask, and removal using our omnimatte-only method.
*Note: Using their beta code, we were unable to exactly reproduce their results. Thus we run their code on their manual masks and use these results for comparison.

"Horsejump-low"


"Breakdance"


"Flamingo"


"Car-shadow"


"Breakdance-flare"


"Camel"


"Bear"


"Dance-twirl"


"Elephant"


"Hike"


"Horsejump-high"


"Paragliding"


"Paragliding-launch"

*Note: The omnimatte captures the shadow, but not the cables due to the difficulty in producing thin structures with CNNs.

"Rhino"


"Rollerblade"


"Tennis"


 

 


Comparison with Shadow Detection

We compare our results with ISD [2], a state-of-the-art shadow detection method. We use the default confidence threshold of 0.5 provided by their code. A red border indicates no detection.

"Lucia"


"Dogwalk"


"Soccer"


"Bear"


"Horsejump-Low"


"Tennis"


"Camel"

 


Comparison with Background Subtraction

We compare our method to background subtraction on a subset of CDW-2014 [3]. We compare with BSPVGAN [4], a top performing method on the CDW-2014 benchmark.
In the ground truth, pixel-value labels are: 0=Static, 50=Hard shadow, 85=Outside region of interest, 170=Unknown motion (usually around moving objects, due to semi-transparency and motion blur), 255=Motion.

"b-pedestrians"


"c-traffic"


"b-highway"


"d-overpass"


"P-continuousPan"

*Note: This video is played at half-speed.

"P-twoPositionPTZCam2"


"P-zoomInZoomOut"


"s-busStation"


"s-cubicle"


"s-peopleInShade"


"s-peopleInShade3"


 


Comparison with Layered Neural Rendering

We compare our method with Lu, et al. [5], which uses a human-specific input representation.

"Trampoline"


"Reflection"

 


Ablations

We ablate the flow component, background warp, and brightness adjustment components of our method.

"Bear"


"Trampoline"

 


Different Initializations

Different weight initializations can result in different omnimattes produced. Here we show an example with 3 different random seeds.
E.g. in seed 1, more of the person's shadow is incorrectly associated with the soccerball.

"Soccer"

 


References

[1] Chen Gao, Ayush Saraf, Jia-Bin Huang, and Johannes Kopf. Flow-edge guided video completion. In Proc. European Conference on Computer Vision (ECCV), 2020.
[2] Tianyu Wang, Xiaowei Hu, Qiong Wang, Pheng-Ann Heng, and Chi-Wing Fu. Instance shadow detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[3] Yi Wang, Pierre-Marc Jodoin, Fatih Porikli, Janusz Konrad, Yannick Benezeth, and Prakash Ishwar. CDnet 2014: An expanded change detection benchmark dataset. In CVPR Workshop, 2014.
[4] Wenbo Zheng, Kunfeng Wang, and Fei-Yue Wang. A novel background subtraction algorithm based on parallel vision and bayesian GANs. Neurocomputing, 2020.
[5] Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, David Salesin, William T Freeman, and Michael Rubinstein. Layered neural rendering for retiming people in video. arXiv preprint arXiv:2009.07833, 2020.