Splatting-based Synthesis for Video Frame Interpolation

Splatting-based Synthesis for Video Frame Interpolation
Simon Niklaus, Ping Hu, and Jiawen Chen
arXiv/2201.10075

Frame interpolation is an essential video processing technique that adjusts the temporal resolution of an image sequence. An effective approach to perform frame interpolation is based on splatting, also known as forward warping. Specifically, splatting can be used to warp the input images to an arbitrary temporal location based on an optical flow estimate. A synthesis network, also sometimes referred to as refinement network, can then be used to generate the output frame from the warped images. In doing so, it is common to not only warp the images but also various feature representations which provide rich contextual cues to the synthesis network.

However, while this approach has been shown to work well and enables arbitrary-time interpolation due to using splatting, the involved synthesis network is prohibitively slow. In contrast, we propose to solely rely on splatting to synthesize the output without any subsequent refinement. This splatting-based synthesis is much faster than similar approaches, especially for multi-frame interpolation, while enabling new state-of-the-art results at high resolutions.

The demo below performs our splatting-based synthesis on the fly (Figure 2 and Equation 5 of our paper). That is, the shown results were generated using Javascript from two input images and pre-computed optical flow estimates.