Monocular-Depth-Estimation-Rankings-and-2D-to-3D-Video-Conversion-Rankings
Monocular-Depth-Estimation-Rankings-and-2D-to-3D-Video-Conversion-Rankings copied to clipboard
Rankings include: Depth Anything DPT FutureDepth GBDMF GenPercept LeReS LightedDepth LFVRT Marigold Metric3D MiDaS NeWCRFs PatchFusion UniDepth ZoeDepth
Monocular Depth Estimation Rankings
and 2D to 3D Video Conversion Rankings
List of Rankings
Each ranking includes only the best model for one method.
Monocular Depth Estimation Rankings
- DA-2K (mostly 1500×2000): Acc (%)>=86
- UnrealStereo4K (3840×2160): AbsRel<=0.04
- MVS-Synth (1920×1080): AbsRel<=0.06
- HRSD (1920×1080): AbsRel<=0.08
- Middlebury2021 (1920×1080): SqRel<=0.5
- NYU-Depth V2 (640×480): OPW<=0.31
- NYU-Depth V2 (640×480): AbsRel<=0.058
2D to 3D Video Conversion Rankings
I. Video Inpainting Rankings
- (to do)
II. Light Field Video Reconstruction from Monocular Video Rankings
- :crown: 4DLFVD with up to 10×10 real light field views✔️: LPIPS😍 (no data)
This will be the King of all rankings. We look forward to ambitious researchers. - 4DLFVD with up to 10×10 real light field views✔️: PSNR😞 (no data)
- Hybrid with 7×7 synthetic light field views✖️: LPIPS😍 (no data)
- Hybrid with 7×7 synthetic light field views✖️: PSNR😞>=32dB
Appendices
- Appendix 1: Rules for qualifying models for the rankings (to do)
- Appendix 2: Metrics selection for the rankings (to do)
- Appendix 3: List of all research papers from the above rankings
DA-2K (mostly 1500×2000): Acc (%)>=86
UnrealStereo4K (3840×2160): AbsRel<=0.04
RK | Model | AbsRel ↓ {Input fr.} |
Training dataset |
Official repository |
Practical model |
Vapour- Synth |
---|---|---|---|---|---|---|
1 | ZoeDepth +PFR=128 ENH: |
0.0388 {1} |
ENH: UnrealStereo4K |
ENH: |
- | - |
MVS-Synth (1920×1080): AbsRel<=0.06
RK | Model | AbsRel ↓ {Input fr.} |
Training dataset |
Official repository |
Practical model |
VapourSynth |
---|---|---|---|---|---|---|
1 | ZoeDepth +PFR=128 ENH: |
0.0589 {1} |
ENH: MVS-Synth |
ENH: |
- | - |
HRSD (1920×1080): AbsRel<=0.08
RK | Model | AbsRel ↓ {Input fr.} |
Training dataset |
Official repository |
Practical model |
VapourSynth |
---|---|---|---|---|---|---|
1 | DPT-B + R + AL ENH: |
0.074 {1} |
ENH: HRSD |
ENH: - |
- | - |
Middlebury2021 (1920×1080): SqRel<=0.5
RK | Model | SqRel ↓ {Input fr.} |
Training dataset |
Official repository |
Practical model |
VapourSynth |
---|---|---|---|---|---|---|
1 | LeReS-GBDMF ENH: |
0.444 {1} |
ENH: HR-WSI |
ENH: |
- | - |
NYU-Depth V2 (640×480): OPW<=0.31
RK | Model | OPW ↓ {Input fr.} |
Training dataset |
Official repository |
Practical model |
VapourSynth |
---|---|---|---|---|---|---|
1 | FutureDepth Backbone: Swin-L |
0.303 {4} |
NYU-Depth V2 | - | - | - |
NYU-Depth V2 (640×480): AbsRel<=0.058
Hybrid with 7×7 synthetic light field views✖️: PSNR😞>=32dB
RK | Model | PSNR ↑ {Input fr.} |
Training dataset |
Official repository |
Practical model |
VapourSynth |
---|---|---|---|---|---|---|
1 | LFVRT MDE: DPT Backbone: ViT |
32.66 {3+1D} |
GoPro & TAMULF | MDE: |
- | - |
📝 Note: The above ranking includes only one model, as the other methods are image-based and don't have any temporal information making them unsuitable for light field video reconstruction from monocular video.
Appendix 3: List of all research papers from the above rankings