ECCV2022-RIFE icon indicating copy to clipboard operation
ECCV2022-RIFE copied to clipboard

Optimizing RIFE for denoise

Open DTL2020 opened this issue 2 years ago • 3 comments

We are currently starting to use RIFE for temporal video denoise activity (https://github.com/Asd-g/AviSynthPlus-RIFE/issues/2) . But quality of motion compensation is not ideal so it cause either blurring if use simple average blending tool or decrease denoise power if use more bad-blends protected blending engine.

The idea of using RIFE for denoise - RIFE create interpolation (with already partial 2-frames denoise) of 'before' (current-N) and 'after' (current+N) frames using 0.5 time param (interpolation to 'current' time between +N and -N) and the result is blended (spatially only) with current frame (using weight of 1/3 and 2/3 if for RIFE output frame). So if everything going ideal with perfect RIFE motion interpolation we got averaging of noise in 3 frames and get about SQRT(3) decreasing of photon shot noise at natural video. If we need more denoise we combine many RIFE interpolated frames (+-1, +-2 and so on).

Now as RIFE do not still create ideal motion interpolation (also with 2-frames only basis the temporal aliasing increases as +-N frames increases and fast and/or repeating motion) we have some issues and ideas how to make our denoise activity better.

The benefit of using RIFE over existing hardware or software solutions for motion search/motion compensation is usage mostly GPU for compute and some significant part of blending work so many CPU resources left for MPEG encoding. Also RIFE possibly may perform better in noisy content and it is not produce block artifacts and not require additional anti-blockiness processing (many other motion compensation engines are small blocks based and require additional overlapping blending to fix blockiness).

The current 2 (2+) ideas of making RIFE better for denoise activity:

  1. Is it possible to use 3 input frames (current-N, current, current+N) and learn RIFE to make interpolation between current-N and current+N with time 0.5 and with as much as possible precise matching 'current' frame (but with restriction of using current-frame samples to create output interpolated samples) ? May be it is close to 'learning scripts' activity for example for 2x FPS increasing when neural network is learn to interpolate between +1 and -1 frame from current and result is compared with 0 (current real) frame for quality ? Or it is too complex redesign of current RIFE engine if we want to use 3 input frames ?

  2. If 1. is too complex - may be it is possible to left current 2-input frames design but change algorithm so we can provide 'current' and current+N' frames and set time=0 so ask RIFE interpolate current+N frame to 'current' frame (using 'current' frame as objects placement reference, but with strict requirement to use in output only current+N frame samples) ? It will be 'motion compensation' mode where we ask to perform complete motion compensation of 'source current+N' frame based on 'reference current' frame. But not allow to use samples of reference frame in output. As I understand if we currently provide very low time-param like 0.00001 the current RIFE engine will use mostly first frame samples in output ?

  3. For much better understanding of complex and different speed motion it is require to analyse not 2 frames but big sequence of frames - do it planned to make RIFE versions supporting analysis of sequence of frames ? Or it will even much more slow and use much more GPU memory ?

Can we expect some help from RIFE developers in 1 and 2 questions ?

DTL2020 avatar Feb 22 '23 11:02 DTL2020