AIVFI/Video-Frame-Interpolation-Rankings-and-Video-Deblurring-Rankings: Rankings include: ABME AdaFNIO ALANET AMT BiT BVFI...

Video Frame Interpolation Rankings
and Video Deblurring Rankings

Gradually I intend to add new rankings, but my priority is to keep the existing ones up to date. Below is a list of 3 upcoming updates that I intend to add to keep the existing rankings up to date:

Add enhanced models: [arXiv]
Add enhanced models: [arXiv]
Add missing model: VFIFT [arXiv]

I will also gradually change the layout of tables, so the old and new layouts of the tables will appear simultaneously for some time. In the near future I will write a little more information about what inspired me to add my new repository: Monocular Depth Estimation Rankings and 2D to 3D Video Conversion Rankings

Researchers! Please train at least one of your models on perceptual loss. I have made a special column in my rankings dedicated specifically to such models. Why models trained on perceptual loss? This is best summarised by the following quote [^15]:

"the model trained using color loss 𝓛_Lap performs best in terms of PSNR and SSIM whereas the one trained using perceptual loss 𝓛_F performs best in terms of LPIPS. We further note that the 𝓛_F-trained model better recovers fine details in challenging cases, making it preferable in practice."

It can be seen from the results of two video frame interpolation models from the quote above on the Vimeo-90K triplet test set [^15]:

Model	PSNR ↑	SSIM ↑	LPIPS ↓
SoftSplat - 𝓛_Lap	36.10dB	0.970	0.021
SoftSplat - 𝓛_F	35.48dB	0.964	0.013

Sometimes even almost 3dB better PSNR result does not guarantee better LPIPS result, as shown by the results of two different video frame interpolation methods on the Vimeo-90K septuplet test set [^27]:

Model	PSNR ↑	SSIM ↑	LPIPS ↓
VFIT-B	36.963dB	0.9649	0.0304
RIFE	34.048dB	0.9449	0.0233

LPIPS [^30] is a metric that reflects human perception much better than PSNR or SSIM, which is also evident from the results presented in the paper of the competitive perceptual metric [^29]:

IQA Model	BAPPS database Frame interpolation 2AFC score ↑	BAPPS database Video deblurring 2AFC score ↑	Ding20 database Deblurring 2AFC score ↑
Human	0.686	0.671	0.843
LPIPS	0.630	0.605	0.788
PSNR	0.543	0.590	0.518
SSIM	0.548	0.583	0.575

List of Rankings

Each ranking includes only the best model for one method.

The rankings exclude all event-based models.

Joint Video Deblurring and Frame Interpolation Rankings

:crown: RBI with real motion blur✔️: LPIPS😍 (no data)
This will be the King of all rankings. We look forward to ambitious researchers.
RBI with real motion blur✔️: PSNR😞>=28.5dB
Adobe240 (640×352) with synthetic motion blur✖️: LPIPS😍 (no data)
Adobe240 (640×352) with synthetic motion blur✖️: PSNR😞>=33.3dB
Adobe240 (5:8) with synthetic motion blur✖️: LPIPS😍 (no data)
Adobe240 (5:8) with synthetic motion blur✖️: PSNR😞>=25dB

Video Deblurring Rankings

(to do)

Video Frame Interpolation Rankings

Vimeo-90K triplet: LPIPS😍(SqueezeNet)<=0.014
Vimeo-90K triplet: LPIPS😍<=0.018
Vimeo-90K triplet: PSNR😞>=36dB
Vimeo-90K septuplet: LPIPS😍<=0.032
Vimeo-90K septuplet: PSNR😞>=36dB

Appendices

Appendix 1: Rules for qualifying models for the rankings (to do)
Appendix 2: Metrics selection for the rankings
Appendix 3: List of all research papers from the above rankings

RBI with real motion blur✔️: PSNR😞>=28.5dB

RK	Model	PSNR ↑ {Input fr.}	Training dataset	Practical model	VapourSynth
1	Pre-BiT++	31.32 {3}	Pretraining: Adobe240 Training: RBI		-
2	DeMFI-Net_rb(5,3)	29.03 {4}	RBI	-	-
3	PRF₄ -Large ENH:	28.55 {5}	RBI	-	-

Adobe240 (640×352) with synthetic motion blur✖️: PSNR😞>=33.3dB

RK	Model	PSNR ↑ {Input fr.}	Originally announced or Training dataset	Official repository	Practical model	VapourSynth
1	BVFI	35.43 {4}	Adobe240	-	-	-
2	BiT++	34.97 {3}	Adobe240			-
3	DeMFI-Net_rb(5,3)	34.34 {4}	Adobe240		-	-
4	ALANET	33.34dB [^37]	August 2020 [^37]		-	-
5	PRF₄ -Large	33.32dB [^38]	February 2020 [^35]		-	-

Adobe240 (5:8) with synthetic motion blur✖️: PSNR😞>=25dB

RK	Model	PSNR ↑	Originally announced	Practical model	VapourSynth
1	VIDUE	28.74dB [^39]	March 2023 [^39]	-	-
2	FLAVR	27.23dB [^39]	December 2020 [^9]	-	-
3	UTI-VFI	26.69dB [^39]	December 2020 [^40]	-	-
4	DeMFI	25.71dB [^39]	November 2021 [^34]	-	-

Vimeo-90K triplet: LPIPS😍(SqueezeNet)<=0.014

RK	Model	LPIPS ↓	Originally announced	Official repository	Practical model	VapourSynth
1	CDFI w/ adaP/U	0.008 [^23]	March 2021 [^24]		-	-
2	EDSC_s-𝓛_F	0.010 [^24]	June 2020 [^25]		EDSC_s-𝓛_F	-
3	DRVI	0.013 [^26]	August 2021 [^26]	-	-	-

Vimeo-90K triplet: LPIPS😍<=0.018

RK	Model	LPIPS ↓ {Input fr.}	Training dataset	Official repository	Practical model	VapourSynth
1	EAFI-𝓛_ecp	0.012 {2}	Vimeo-90K triplet	-	EAFI-𝓛_ecp	-
2	UGFI 𝓛_S	0.0126 {2}	Vimeo-90K triplet	-	UGFI 𝓛_S	-
3	SoftSplat - 𝓛_F	0.013 {2}	Vimeo-90K triplet		SoftSplat - 𝓛_F	-
4	FILM-𝓛_S	0.0132 {2}	Vimeo-90K triplet		FILM-𝓛_S	-
5	EDSC_s-𝓛_F	0.016 {2}	Vimeo-90K triplet		EDSC_s-𝓛_F	-
6	CtxSyn - 𝓛_F	0.017 {2}	proprietary	-	CtxSyn - 𝓛_F	-
7	PerVFI	0.018 {2}	Vimeo-90K triplet	-	PerVFI	-

Vimeo-90K triplet: PSNR😞>=36dB

RK	Model	PSNR ↑ {Input fr.}	Originally announced or Training dataset	Official repository	Practical model	VapourSynth
1	MA-GCSPA triplet-trained	36.76dB [^3]	March 2022 [^3]		-	-
2	VFIformer + HRFFM ENH:	36.69 {2}	Vimeo-90K triplet	ENH: -	-	-
3	LADDER-L	36.65 {2}	Vimeo-90K triplet	-	-	-
4	EMA-VFI	36.64dB [^22]	March 2023 [^22]		-	-
5	DQBC-Aug	36.57dB [^36]	April 2023 [^36]		-	-
6	TTVFI	36.54dB [^4]	July 2022 [^4]		-	-
7	AMT-G	36.53dB [^31]	April 2023 [^31]		-	-
8	AdaFNIO	36.50dB [^19]	November 2022 [^19]		-	-
9	FGDCN-L	36.46dB [^21]	November 2022 [^21]		-	-
10	UPR-Net LARGE	36.42dB [^18]	November 2022 [^18]		-	-
11	EAFI-𝓛_ecc	36.38dB [^8]	July 2022 [^8]	-	EAFI-𝓛_ecp	-
12	H-VFI-Large	36.37dB [^20]	November 2022 [^20]	-	-	-
13	UGFI 𝓛₁	36.34 {2}	Vimeo-90K triplet	-	UGFI 𝓛_S	-
14	SoftSplat - 𝓛_Lap with ensemble	36.28dB [^28]	March 2020 [^15]		SoftSplat - 𝓛_F	-
15	NCM-Large	36.22dB [^32]	July 2022 [^32]	-	-	-
16-17	IFRNet large	36.20dB [^10]	May 2022 [^10]		-	-
16-17	RAFT-M2M++ ENH:	36.20 {2}	Vimeo-90K triplet		-	-
18-19	EBME-H*	36.19dB [^11]	June 2022 [^11]		-	-
18-19	RIFE-Large	36.19 {2}	Vimeo-90K triplet		RIFE v4.15
20-21	ABME	36.18dB [^13]	August 2021 [^13]		-	-
20-21	ProBoost-Net	36.18 {2}	?	-	-	-
22	TDPNet_nv w/o MRTM	36.069 {2}	Vimeo-90K triplet	-	TDPNet	-
23	FILM-𝓛₁	36.06 {2}	Vimeo-90K triplet		FILM-𝓛_S	-

Vimeo-90K septuplet: LPIPS😍<=0.032

RK	Model	LPIPS ↓	Originally announced	Practical model	VapourSynth
1	RIFE	0.0233 [^27]	November 2020 [^12]	RIFE v4.15
2	IFRNet	0.0274 [^27]	May 2022 [^10]	-	-
3	VFIT-B	0.0304 [^27]	November 2021 [^2]	-	-
4	ABME	0.0309 [^27]	August 2021 [^13]	-	-

Vimeo-90K septuplet: PSNR😞>=36dB

RK	Model	PSNR ↑ {Input fr.}	Originally announced or Training dataset	Official repository	Practical model	VapourSynth
1	JNMR	37.19dB [^1]	June 2022 [^1]		-	-
2	VFIT-B	36.96dB [^2]	November 2021 [^2]		-	-
3	VRT	36.53dB [^5]	June 2022 (VFI) [^5]		-	-
4	ST-MFNet	36.507dB [^42]	November 2021 [^7]		-	-
5	MA-GCSPA septuplet-trained	36.50dB [^3]	March 2022 [^3]		-	-
6	EDENVFI PVT(15,15)	36.387dB [^42]	July 2023 [^42]	-	-	-
7	IFRNet	36.37 {2}	Vimeo-90K septuplet		-	-
8	RN-VFI	36.33 {4}	Vimeo-90K septuplet	-	-	-
9	FLAVR	36.25dB [^9]	December 2020 [^9]		-	-
10	DBVI	36.17dB [^17]	October 2022 [^17]		-	-
11	EDC	36.14dB [^1]	February 2022 [^14]		-	-

Appendix 2: Metrics selection for the rankings

Currently, the most commonly used metrics in the existing works on video frame interpolation and video deblurring are: PSNR, SSIM and LPIPS. Exactly in that order.

The main purpose of creating my rankings is to look for the best perceptually-oriented model for practical applications - hence the primary metric in my rankings will be the most common perceptual image quality metric in scientific papers: LPIPS.

At the time of writing these words, in October 2023, in relation to VFI, I have only found another perceptual image quality metric - DISTS in one paper: and also in one paper I found a bespoke VFI metric - FloLPIPS [arXiv]. Unfortunately, both of these papers omit to evaluate the best performing models based on the LPIPS metric. If, in the future, some researcher will evaluate LPIPS top-performing models using alternative, better perceptual metrics, I would of course be happy to add rankings based on those metrics.

I would like to use only one metric - LPIPS. Unfortunately still many of the best VFI and video deblurring methods are only evaluated using PSNR or PSNR and SSIM. For this reason, I will additionally present rankings based on PSNR, which will show the models that can, after perceptually-oriented training, be the best for practical applications, as well as providing a source of knowledge for building even better practical models in the future.

I have decided to completely abandon rankings based on the SSIM metric. Below are the main reasons for this decision, ranked from the most important to the less important.

The main reason is the following quote, which I found in a paper by researchers at Adobe Research: [^28]. In the quote they refer to a paper by researchers at NVIDIA: [arXiv].

We limit the evaluation herein to the PSNR metric since SSIM [57] is subject to unexpected and unintuitive results [39].
The second reason is, more and more papers are appearing where PSNR scores are given, but without SSIM: [^42] and A model from such a paper appearing only in the PSNR-based ranking and at the same time not appearing in the SSIM-based ranking may give the misleading impression that the SSIM score is so poor that it does not exceed the ranking eligibility threshold, while there is simply no SSIM score in a paper.
The third reason is, that often the SSIM scores of individual models are very close to each other or identical. This is the case in the SNU-FILM Easy test, as shown in Table 3: [CVPR 2023], where as many as 6 models achieve the same score of 0.991 and as many as 5 models achieve the same score of 0.990. In the same test, PSNR makes it easier to determine the order of the ranking, with the same number of significant digits.
The fourth reason is that PSNR-based rankings are only ancillary when a model does not have an LPIPS score. For this reason, SSIM rankings do not add value to my repository and only reduce its readability.
The fifth reason is that I want to encourage researchers who want to use only two metrics in their paper to use LPIPS and PSNR instead of PSNR and SSIM.
The sixth reason is that the time saved by dropping the SSIM-based rankings will allow me to add new rankings based on other test data, which will be more useful and valuable.

Appendix 3: List of all research papers from the above rankings

📝 Note: Temporarily, the following list contains full descriptions of those methods that have been removed from the footnotes or not included in the footnotes at all due to the new layout of the tables.

Method	Paper	Venue
ABME
AdaFNIO
ALANET
AMT
BIN	Blurry Video Frame Interpolation
BiT	Blur Interpolation Transformer for Real-World Motion from Blur
BVFI	Three-Stage Cascade Framework for Blurry Video Frame Interpolation
CDFI
CtxSyn	Context-aware Synthesis for Video Frame Interpolation
DBVI
DeMFI	DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with Flow-Guided Attentive Correlation and Recursive Boosting
DQBC
DRVI
EAFI	Error-Aware Spatial Ensembles for Video Frame Interpolation
EBME
EDC
EDENVFI
EDSC	Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution
EMA-VFI
FGDCN
FILM	FILM: Frame Interpolation for Large Motion
FLAVR
HRFFM	Video Frame Interpolation with Region-Distinguishable Priors from SAM
H-VFI
IFRNet	IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation
JNMR
LADDER	LADDER: An Efficient Framework for Video Frame Interpolation
M2M	Many-to-many Splatting for Efficient Video Frame Interpolation
MA-GCSPA
NCM
PerVFI	Perceptual-Oriented Video Frame Interpolation Via Asymmetric Synergistic Blending
PRF	Video Frame Interpolation and Enhancement via Pyramid Recurrent Framework
ProBoost-Net	Progressive Motion Boosting for Video Frame Interpolation
RIFE	Real-Time Intermediate Flow Estimation for Video Frame Interpolation
RN-VFI	Range-nullspace Video Frame Interpolation with Focalized Motion Estimation
SoftSplat	Softmax Splatting for Video Frame Interpolation
SSR	Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement
ST-MFNet
TDPNet	Textural Detail Preservation Network for Video Frame Interpolation
TTVFI
UGFI	Frame Interpolation Transformer and Uncertainty Guidance
UPR-Net
UTI-VFI
VFIformer	Video Frame Interpolation with Transformer
VFIT
VIDUE
VRT

[^1]: JNMR: Joint Non-linear Motion Regression for Video Frame Interpolation [TIP 2023] [arXiv] [^2]: Video Frame Interpolation Transformer [CVPR 2022] [arXiv] [^3]: Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation [CVPR 2023] [arXiv] [^4]: TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation [TIP 2023] [arXiv] [^5]: VRT: A Video Restoration Transformer [arXiv]

[^7]: ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation [CVPR 2022] [arXiv] [^8]: Error-Aware Spatial Ensembles for Video Frame Interpolation [arXiv] [^9]: FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [WACV 2023] [arXiv] [^10]: IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation [CVPR 2022] [arXiv] [^11]: Enhanced Bi-directional Motion Estimation for Video Frame Interpolation [WACV 2023] [arXiv] [^12]: Real-Time Intermediate Flow Estimation for Video Frame Interpolation [ECCV 2022] [arXiv] [^13]: Asymmetric Bilateral Motion Estimation for Video Frame Interpolation [ICCV 2021] [arXiv] [^14]: Enhancing Deformable Convolution based Video Frame Interpolation with Coarse-to-fine 3D CNN [ICIP 2022] [arXiv] [^15]: Softmax Splatting for Video Frame Interpolation [CVPR 2020] [arXiv]

[^17]: Deep Bayesian Video Frame Interpolation [ECCV 2022] [^18]: A Unified Pyramid Recurrent Network for Video Frame Interpolation [CVPR 2023] [arXiv] [^19]: AdaFNIO: Adaptive Fourier Neural Interpolation Operator for video frame interpolation [arXiv] [^20]: H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions [arXiv] [^21]: Flow Guidance Deformable Compensation Network for Video Frame Interpolation [TMM 2023] [arXiv] [^22]: Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation [CVPR 2023] [arXiv] [^23]: AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling [TIP 2022] [arXiv] [^24]: CDFI: Compression-Driven Network Design for Frame Interpolation [CVPR 2021] [arXiv] [^25]: Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution [TPAMI 2021] [arXiv] [^26]: DRVI: Dual Refinement for Video Interpolation [Access 2021] [^27]: Exploring Discontinuity for Video Frame Interpolation [CVPR 2023] [arXiv] [^28]: Revisiting Adaptive Convolutions for Video Frame Interpolation [WACV 2021] [arXiv] [^29]: Locally Adaptive Structure and Texture Similarity for Image Quality Assessment [MM 2021] [arXiv] [^30]: The Unreasonable Effectiveness of Deep Features as a Perceptual Metric [CVPR 2018] [arXiv] [^31]: AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation [CVPR 2023] [arXiv] [^32]: Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [MM 2022] [arXiv] [^33]: Blur Interpolation Transformer for Real-World Motion from Blur [CVPR 2023] [arXiv] [^34]: DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with Flow-Guided Attentive Correlation and Recursive Boosting [ECCV 2022] [arXiv] [^35]: Blurry Video Frame Interpolation [CVPR 2020] [arXiv] [^36]: Video Frame Interpolation with Densely Queried Bilateral Correlation [IJCAI 2023] [arXiv] [^37]: ALANET: Adaptive Latent Attention Network for Joint Video Deblurring and Interpolation [MM 2020] [arXiv] [^38]: Video Frame Interpolation and Enhancement via Pyramid Recurrent Framework [TIP 2020] [^39]: Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time [CVPR 2023] [arXiv] [^40]: Video Frame Interpolation without Temporal Priors [NeurIPS 2020] [arXiv]

[^42]: Efficient Convolution and Transformer-Based Network for Video Frame Interpolation [ICIP 2023] [arXiv]

Video-Frame-Interpolation-Rankings-and-Video-Deblurring-Rankings
Video-Frame-Interpolation-Rankings-and-Video-Deblurring-Rankings copied to clipboard

Metadata