depth anything;"I would like to ask, in this run.py code, at this point, what does this depth represent? If it represents depth, why is the depth value larger for closer objects and smaller for objects further away?"
import argparse import cv2 import numpy as np import os import torch import torch.nn.functional as F from torchvision.transforms import Compose from tqdm import tqdm
from depth_anything.dpt import DepthAnything from depth_anything.util.transform import Resize, NormalizeImage, PrepareForNet
if name == 'main': parser = argparse.ArgumentParser() parser.add_argument('--img-path', type=str) parser.add_argument('--outdir', type=str, default='./vis_depth') parser.add_argument('--encoder', type=str, default='vitl', choices=['vits', 'vitb', 'vitl'])
parser.add_argument('--pred-only', dest='pred_only', action='store_true', help='only display the prediction')
parser.add_argument('--grayscale', dest='grayscale', action='store_true', help='do not apply colorful palette')
args = parser.parse_args()
margin_width = 50
caption_height = 60
font = cv2.FONT_HERSHEY_SIMPLEX
font_scale = 1
font_thickness = 2
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
depth_anything = DepthAnything.from_pretrained('LiheYoung/depth_anything_{}14'.format(args.encoder)).to(DEVICE).eval()
total_params = sum(param.numel() for param in depth_anything.parameters())
print('Total parameters: {:.2f}M'.format(total_params / 1e6))
transform = Compose([
Resize(
width=518,
height=518,
resize_target=False,
keep_aspect_ratio=True,
ensure_multiple_of=14,
resize_method='lower_bound',
image_interpolation_method=cv2.INTER_CUBIC,
),
NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
PrepareForNet(),
])
# 定义你自己的文件路径
your_file_path = 'E:\\Depth-Anything-main\\assets\\examples'
if os.path.isfile(your_file_path): # 使用你的文件路径替换 args.img_path
if your_file_path.endswith('txt'):
with open(your_file_path, 'r') as f:
filenames = f.read().splitlines()
else:
filenames = [your_file_path]
else:
filenames = os.listdir(your_file_path)
filenames = [os.path.join(your_file_path, filename) for filename in filenames if not filename.startswith('.')]
filenames.sort()
# 定义输出目录路径
output_dir = 'E:\\Depth-Anything-main\\out'
# 创建输出目录
os.makedirs(output_dir, exist_ok=True)
# 使用输出目录作为args.outdir
args.outdir = output_dir
for filename in tqdm(filenames):
raw_image = cv2.imread(filename)
image = cv2.cvtColor(raw_image, cv2.COLOR_BGR2RGB) / 255.0
h, w = image.shape[:2]
image = transform({'image': image})['image']
image = torch.from_numpy(image).unsqueeze(0).to(DEVICE)
with torch.no_grad():
depth = depth_anything(image)
Because that is how depth mappings work. It is not reversed.
https://youtu.be/1MgZOJD9uFE?si=Xr-MCziFdJPYi2bj
When you import a depth map into blender or unreal engine or any 3D software, the white areas are always higher than the black areas. Thus when you import such a depth map, you get this result.
因为这就是深度映射的工作方式。它不会被逆转。
https://youtu.be/1MgZOJD9uFE?si=Xr-MCziFdJPYi2bj
当您将深度图导入 Blender 或虚幻引擎或任何 3D 软件时,白色区域始终高于黑色区域。因此,当您导入这样的深度图时,您将获得此结果。
![]()
![]()
Is this correct, why is the value near large and the value at far small?
As previously said: if you consider values between 0 and 255 (common values in graphical software), pure black is always 0 and pure white is 255. The further back something is on the Z-axis, the darker the color. The closer something is to the camera, the nearer it is on the Z-aix, the whiter the color.
Is this correct, why is the value near large and the value at far small?
I really don't understand what is the big deal, just invert the colors if it matters that much or you need it to be reversed. It's the same process even if it's the "Wrong way". Just invert black to white and white to black and it's fixed.
如前所述:如果考虑介于 0 和 255 之间的值(图形软件中的常用值),纯黑色始终为 0,纯白色为 255。Z 轴上的东西越靠后,颜色越深。某物离相机越近,它越靠近 Z-aix,颜色就越白。
Yes, I know how it works, but the value of my label strip doesn't make sense
这是正确的吗,为什么值接近大而值远小?
我真的不明白有什么大不了的,如果它那么重要,就反转颜色,或者你需要反转它。即使它是“错误的方式”,也是相同的过程。只需将黑色反转为白色,将白色反转为黑色,它就可以固定了。
Yes, when I change the code to it, he becomes, the near value is small, the far value is large. But I don't understand why the original code is like this Original code:depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0 Changed code:depth = (depth.max() - depth) * 255.0 / (depth.max() - depth.min())
The output is proportional to the multiplicative inverse of the true depth, something like:
True Depth = 1 / (A + B*normalized_depthanything_output)
Where A and B are some unknown shift/scale terms (varies by scene), and normalized_depthanything_output would be the output from the model normalized to be between 0 and 1. This is how the older MiDaS model was set up and was done to make it possible to train on a larger variety of data (there's more of an explanation in the MiDaS paper under section 5).
如前所述:如果考虑介于 0 和 255 之间的值(图形软件中的常用值),纯黑色始终为 0,纯白色为 255。Z 轴上的东西越靠后,颜色越深。某物离相机越近,它越靠近 Z-aix,颜色就越白。
Yes, I know how it works, but the value of my label strip doesn't make sense
I think that's a very important question.If I want to use the model's output too represnt the depth annotations,I should know the physical meaning of the value of output.However,if the value of output represents the depth,it should be black when it's close object.However,it's reversed.I'm very confused abut that.Because the groundtruth depth is represent distance of object,the output of model should be the same,but the fact is not.Do you have answer now?
When we make a model, starting from z=0, closer to the camera, larger value of z should be.I think the depth value mean that, right?
