ZoeDepth icon indicating copy to clipboard operation
ZoeDepth copied to clipboard

ZeoD_K: Unexpected Max Depth Values

Open HernandezEduin opened this issue 11 months ago • 5 comments

Hi, I'm testing the ZoeDepth pretrained models on images captured by highway-side cameras. These images have ground truth depth values that span beyond 300 meters, which I understand exceeds the model's training range (max depth: 80.0 meters).

However, when running the pretrained KITTI model, I consistently get a maximum estimated depth of approximately 5 meters. Below is the code snippet I used:

import torch

zoe = torch.hub.load("isl-org/ZoeDepth", "ZoeD_K", pretrained=True)
predicted_depth = zoe.infer_pil(image, pad_input=False)  # Better 'metric' accuracy

For comparison, I also tested the NK model. It provides a more reasonable maximum depth estimate, ranging between 30–50 meters, which aligns better with the expected values from the model.Additionally, I tried using the KITTI model weights from Hugging Face , but the results were similar.

Upon inspecting the model configurations, I noticed potential discrepancies in the uploaded weights:

"bin_configurations": [
{
"max_depth": 10.0,
"min_depth": 0.001,
"n_bins": 64,
"name": "nyu"
}

This configuration suggests a maximum depth of 10.0 meters, which might explain the observed behavior.

Questions/Concerns:

  • Is the issue related to incorrect configurations in the uploaded weights?
  • Could you reupload the weights of the KITTI model with a max depth of 80 meters?
  • Are there any suggestions for datasets that are beyond 100 meters?

HernandezEduin avatar Jan 22 '25 06:01 HernandezEduin

It seems to be a similar issue to #28 and #45, but hasn't been properly addressed.

HernandezEduin avatar Jan 22 '25 06:01 HernandezEduin

Hello, I have also encountered the same problem. Can we understand that the maximum depth is a conversion ratio scale from relative distance to absolute distance? If I know the maximum depth of the actual image and input it into the model, does it mean that I can obtain an accurate absolute depth?

zhangjy328 avatar Mar 27 '25 06:03 zhangjy328

Hello, I have also encountered the same problem. Can we understand that the maximum depth is a conversion ratio scale from relative distance to absolute distance? If I know the maximum depth of the actual image and input it into the model, does it mean that I can obtain an accurate absolute depth?

From my understanding, the model returns the estimated absolute depth distance. You can try rescaling it accordingly, but it won't be perform that well. I'd recommend using NYU-KITTI weights instead. It performs better and more consistent, but still not good enough for outdoor (You'll notice this if you do a point cloud reconstruction of the scene with textures).

HernandezEduin avatar Mar 27 '25 08:03 HernandezEduin

Additionally, you can refer to the response in their respective hugging face website link

HernandezEduin avatar Mar 27 '25 09:03 HernandezEduin

I am facing the same issue with ZoeD_K. Unfortunately, there does not seem to be a proper solution to this problem yet.

What did not work: For me, the solution mentioned in #28 does not work as well. Using mode="eval" (see code below) does not help to get the depth estimates in the expected 0-80m range, instead of the erroneous 0-10m range.

# ZoeD_K
conf = get_config("zoedepth", "eval", config_version="kitti")
model_zoe_k = build_model(conf)

I think that the error is due to the dataset being "nyu" in the config. For some reason, even when loading the config for ZoeD_K, the dataset is still set as "nyu"

What worked: I changed the dataset to "kitti" using the change_dataset function in ZoeDepth/zoedepth/utils/config.py. This helped me get the depth estimates in the expected 0-80m range.

# ZoeD_K
conf = get_config("zoedepth", "eval", config_version="kitti")
conf = change_dataset(conf, new_dataset="kitti")
model_zoe_k = build_model(conf)

namrata-jangid avatar Apr 26 '25 21:04 namrata-jangid