IneedhelpRr

Results 10 comments of


                                            IneedhelpRr

Problem solved.

Hello, I would like to ask how this metric depth is implemented, I always fail following the tutorial

depth anything；"I would like to ask, in this run.py code, at this point, what does this depth represent? If it represents depth, why is the depth value larger for closer objects and smaller for objects further away?"

![colorbar_5](https://github.com/LiheYoung/Depth-Anything/assets/171676399/18baf04d-6d50-4c20-b78e-0ad145b923f0)

depth anything；"I would like to ask, in this run.py code, at this point, what does this depth represent? If it represents depth, why is the depth value larger for closer objects and smaller for objects further away?"

> 因为这就是深度映射的工作方式。它不会被逆转。 > > https://youtu.be/1MgZOJD9uFE?si=Xr-MCziFdJPYi2bj > > 当您将深度图导入 Blender 或虚幻引擎或任何 3D 软件时，白色区域始终高于黑色区域。因此，当您导入这样的深度图时，您将获得此结果。 ![image](https://private-user-images.githubusercontent.com/63294141/340073741-5330cc1c-cadf-4246-9114-1178eea85159.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg2OTA5MDEsIm5iZiI6MTcxODY5MDYwMSwicGF0aCI6Ii82MzI5NDE0MS8zNDAwNzM3NDEtNTMzMGNjMWMtY2FkZi00MjQ2LTkxMTQtMTE3OGVlYTg1MTU5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE4VDA2MDMyMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk3Y2Y1N2ZhODI2ZjRlYWE4NTY5MzE1Y2FjMDA5MTA1ZDgyNTA0NjA3YjllODE0ZWRiMzY3OWM2N2E0NDliYTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.wMuQXwuEx5GFIWK63FI3EJVoIkjjQEyH49x7h_RuNyA) ![image](https://private-user-images.githubusercontent.com/63294141/340073752-739e1521-910f-4d03-9809-f05a3b0dc275.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg2OTA5MDEsIm5iZiI6MTcxODY5MDYwMSwicGF0aCI6Ii82MzI5NDE0MS8zNDAwNzM3NTItNzM5ZTE1MjEtOTEwZi00ZDAzLTk4MDktZjA1YTNiMGRjMjc1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE4VDA2MDMyMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQwYTUyNjJkNWE4MjQzYzA0ZDA4YmMwNTZhYjU2YzczMmIwNGQ2MDk5ZTc2Y2I4MzcxYWVmZTU4ZWQwNDY0MGMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.1IqLxySXHvF3MLK0meVVqJqDEynnteT8wq-2lz6BPBQ) ![image](https://private-user-images.githubusercontent.com/63294141/340073769-5348dc66-4bb2-4ed4-a70d-99e073ec660e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg2OTA5MDEsIm5iZiI6MTcxODY5MDYwMSwicGF0aCI6Ii82MzI5NDE0MS8zNDAwNzM3NjktNTM0OGRjNjYtNGJiMi00ZWQ0LWE3MGQtOTllMDczZWM2NjBlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjE4VDA2MDMyMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQxODYwODRlN2JjNTUzMDEwNjA1ZTlmZGExZWM0MTRmZTQ5YThhNjFkNjNmYzczZWY1ZjJiN2MyODczMGY0OTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.3-r7xT0QAcl95FiLR07w36-YIMDPf3cj4hrNWRv4HsM) Is this correct, why is the value near large and the value at far small?

depth anything；"I would like to ask, in this run.py code, at this point, what does this depth represent? If it represents depth, why is the depth value larger for closer objects and smaller for objects further away?"

> 如前所述：如果考虑介于 0 和 255 之间的值（图形软件中的常用值），纯黑色始终为 0，纯白色为 255。Z 轴上的东西越靠后，颜色越深。某物离相机越近，它越靠近 Z-aix，颜色就越白。 Yes, I know how it works, but the value of my label strip doesn't make sense

depth anything；"I would like to ask, in this run.py code, at this point, what does this depth represent? If it represents depth, why is the depth value larger for closer objects and smaller for objects further away?"

> > 这是正确的吗，为什么值接近大而值远小？ > > 我真的不明白有什么大不了的，如果它那么重要，就反转颜色，或者你需要反转它。即使它是“错误的方式”，也是相同的过程。只需将黑色反转为白色，将白色反转为黑色，它就可以固定了。 Yes, when I change the code to it, he becomes, the near value is small, the far value is large. But I don't understand...

Has anyone told me what the model structure of the midas looks like?

> MiDaS 模型的结构在他们的[预印本论文](https://arxiv.org/abs/2103.13413)中进行了描述，包括第 3 页的图表（图 1）。我在 [github](https://github.com/heyoeyo/muggled_dpt/tree/main/lib#dpt-structure) 上也有结构（DPT）的描述以及[代码](https://github.com/heyoeyo/muggled_dpt/blob/aa1be77a411c2cd625acfc68c6e04822f6960e34/lib/dpt_model.py#L60-L83)。 Yes, I tried to read his article, is it based on the ResNet encoder, plus a series of loss functions...

Has anyone told me what the model structure of the midas looks like?

> > MiDaS 模型的结构在他们的[预印本论文](https://arxiv.org/abs/2103.13413)中进行了描述，包括第 3 页的图表（图 1）。我在 [github](https://github.com/heyoeyo/muggled_dpt/tree/main/lib#dpt-structure) 上也有结构（DPT）的描述以及[代码](https://github.com/heyoeyo/muggled_dpt/blob/aa1be77a411c2cd625acfc68c6e04822f6960e34/lib/dpt_model.py#L60-L83)。 > > 是的，我试着看他的文章，是不是基于ResNet编码器，加上一系列的损耗函数来预测？我更想知道的是它的模型结构，他的卷积层是什么？我读过你的DPT结构，如你所知，我对计算机视觉的研究了解不多，所以我仍然有一些困难要理解。我的目标是了解MIDAS的模型结构，并尝试训练一个与我的领域相关的模型，以在此基础上进行研究。但这看起来很困难。 I've also looked at depthanything before, but that one is too difficult for me to understand,...

Has anyone told me what the model structure of the midas looks like?

> > 它是基于ResNet编码器的吗 > > 所有较新的 MiDaS 模型（版本 3 和 3.1）都改用“视觉转换器”而不是使用 ResNet 来编码输入图像，尽管 DPT 结构的其余部分仍然使用卷积。视觉转换器的工作方式与卷积模型有很大不同。如果你不熟悉它们，我会推荐介绍它们的原始论文[：“一张图片值得 16x16 个字”](https://arxiv.org/abs/2010.11929)。还有一个关于变形金刚的非常大的指南（更多的是文本而不是图像），称为[“图解变形金刚”](https://jalammar.github.io/illustrated-transformer/)。 > > DPT 模型结构由 4 部分组成：第一部分是补丁嵌入和[视觉转换器](https://github.com/heyoeyo/muggled_dpt/blob/main/lib/.readme_assets/image_encoder_model.svg)，它根据输入图像生成向量列表（也称为标记）。第二部分（称为[重组](https://github.com/heyoeyo/muggled_dpt/blob/main/lib/.readme_assets/reassembly_model.svg)）获取向量列表，并将它们重新塑造成类似图像的数据（如像素网格）。第三部分（称为[融合](https://github.com/heyoeyo/muggled_dpt/blob/main/lib/.readme_assets/fusion_model.svg)）结合了重新组合的图像数据，并对结果进行卷积。第四部分（称为[头部](https://github.com/heyoeyo/muggled_dpt/blob/main/lib/.readme_assets/monodepth_head_model.svg)）只是做更多的卷积来生成最终的深度输出。每个部分还包括缩放/调整大小步骤，但这被硬编码到模型中（这不是模型需要学习的东西）。 > > 最初的 MiDaS [预印本](https://arxiv.org/pdf/2103.13413)实际上在附录（第 12 页）中有一个图，它显示了在模型的“融合”部分（即第...

Has anyone told me what the model structure of the midas looks like?

> > How do I train my own model, based on that > > In theory, any 'typical' training loop should work on these DPT models. However, doing a good...

Has anyone told me what the model structure of the midas looks like?

> > 如何在此基础上训练自己的模型 > > 从理论上讲，任何“典型”训练循环都应该适用于这些 DPT 模型。然而，做好培训通常是一件很难做到的事情，并且有整篇研究论文专门讨论这一点。目前，这基本上是一个博士论文题目。例如，[深度论文](https://arxiv.org/abs/2401.10891)就是这样，它几乎完全专注于如何更好地训练这些模型，而不是关于模型结构。所以它可能很难理解！ > > 令人惊讶的是，可用于训练这些类型的模型的示例代码很少（至少，我没有找到太多）。我唯一知道的是原始[的 ZoeDepth](https://github.com/isl-org/ZoeDepth?tab=readme-ov-file#training) 模型和相关的深度 - 任何 [v1 公制深度](https://github.com/LiheYoung/Depth-Anything/tree/main/metric_depth#training)和 [v2 公制深度](https://github.com/DepthAnything/Depth-Anything-V2/tree/main/metric_depth#reproduce-training)模型。因此，我建议从该代码开始，以了解如何处理模型的训练，并阅读描述训练过程的原始 [MiDaS 论文](https://arxiv.org/abs/1907.01341v3)（从第 5 页开始），以及描述类似过程的第一[篇深度论文](https://arxiv.org/abs/2401.10891)（从第 3 页开始）。 > > 或者，[Marigold](https://github.com/prs-eth/Marigold?tab=readme-ov-file#%EF%B8%8F-training)（非常准确，但比 DPT 模型慢）存储库发布了训练代码，您可能还想查看一下（如果您不是特别需要...