MiDaS icon indicating copy to clipboard operation
MiDaS copied to clipboard

High resolution models

Open Demis6 opened this issue 11 months ago • 4 comments

Can we expect higher resolution MIDAS models (e.g., 1024 lines or higher); if so, approximately when?

Also. Do you plan to update tensorflow module to work with new models (version 3.1)?

Demis6 avatar Jul 12 '23 16:07 Demis6

also, excuse my ignorance on the topic, but is there something stopping models from using standard aspect ratios/resolution like 16:9 (1920x1080 or 3840x2160) or is the limitation (of powers of 2, I presume) to simplify calculation complexity or something like that?

ThreeDeeJay avatar Oct 26 '23 16:10 ThreeDeeJay

Hello MiDaS team. I would also love to have MIDAS models with 1024x1024 and higher resolution (1280x1280, 1536x1536, 2048x2048). This would help so much with finding accurate depth, but what is also very important, finding better details in images without sacrificing depth result.

MavenDE avatar Dec 18 '23 21:12 MavenDE

is there something stopping models from using standard aspect ratios/resolution like 16:9

I'm not sure about other models, but the BEiT ones do support somewhat flexible aspect ratios, as long as the side lengths are multiples of 32. The reason for it being limited to multiples of 32 is a combination of the (16x16) patch embedding combined with a requirement for an even number of patches due to the up/downscaling steps performed after the image encoding.

models with 1024x1024 and higher resolution

The BEiT models can take inputs at these higher resolutions, but the results are pretty hit-or-miss (mostly miss), it might pick up more fine details but usually mangles the larger-scale details. The computation also scales poorly, on my machine 1024x1024 takes 15x longer than 512x512.

Until a model trained specifically for higher resolutions comes out, one alternative is to try running the model on tiles/crops of your larger image. From the quick tests I've done, it does seem to grab more details:

tiled_depth

This is a picture of a turtle, with the 'normal' depth estimate at the top and (messy!) tiles at the bottom, both images have had plane-of-best-fit removal to make them more comparable. The original is a 4K image, the tiles are 512x512. You can see a lot more details in the stones at the bottom, as well as a bit more detail on the head/arms of the turtle. Obviously a lot of work is needed to properly stitch all the tiles together, but depending on your use case, it may be faster to set that up than waiting for a bigger model to release (and easier than training a bigger model!)

heyoeyo avatar Dec 19 '23 23:12 heyoeyo

I think, just we can use BoostingMonocularDepth for hires

IndeecFOX avatar Dec 20 '23 17:12 IndeecFOX