Marigold
Marigold copied to clipboard
Predicted Depth to Colorful Point Cloud
Hi,
nice work! I always want to compare the predicted depth with the colorful unprojected point cloud. I compared Marigold, ZoeDepth, OmniDatav2. I tried the following image.
Marigold:
OmniDatav2:
ZoeN:
It's interesting that previous method predicts more likely to a flat geometry, while Marigold can preserve it better :-)
However, i think my unprojection function is not valid for some scenario such as indoor room.
As you can see, the point cloud doesn't have a flat floor and ceiling.
I believe the reason is because of the normalization scale of the predicted depth range. Can you also provide the unprojection function that you used to generate point cloud in the paper?
I am also facing the same issue, the pointcloud looks distorted on indoor images.
@YuxuanSnow what expression did you use? is not perfect, but not so bad (far better than mine, for sure)
Hi,
Thank you for your interest in Marigold and the generation of 3D point clouds from it.
Please note that Marigold predicts affine invariant depth, not metric depth. Meaning that even if the camera intrinsics are known, the unprojection function still is undefined by a scale and a shift. For the example of @YuxuanSnow the scale and shift are not chosen appropriately.
We quickly described our workflow in the Appendix Section 4 of the paper. In practice, it means that you have to try interatively try configurations of scale and shifts, and pick the visually best one.
Hope that helps. Feel free to ask if you have questions.
Edit: grammer
Hi,
Thank you for your interest in Marigold and the generation of 3D point clouds from it.
Please note that Marigold predicts affine invariant depth, not metric depth. Meaning that even if the the camera intrinsics are known. The unprojection function still is undefined by a scale and a shift. For the example of @YuxuanSnow the scale and shift are not chosen appropriately.
We quickly described our workflow in the Appendix Section 4. In practice, it means that you have to try interatively try configurations of scale and shifts, and pick the visually best one.
Hope that helps. Feel free to ask if you have questions.
Hi
Thank you very much for your reply. Unfortunately, I am more of a 3d modeler than a programmer or researcher, so even going by trial and error is extremely complicated for me. Would it be possible to have a basic equation where it would be easy to intervene in the scaling and shift factors? Simple a/(depth+b) don't seems to work
I understand that it transcends the purpose of the research, but it would then be great in the future to have a built-in tool with which to play with parameters while viewing the point cloud, and then export a linearized depth map.
Thanks in advance
Hi, Thank you for your interest in Marigold and the generation of 3D point clouds from it. Please note that Marigold predicts affine invariant depth, not metric depth. Meaning that even if the the camera intrinsics are known. The unprojection function still is undefined by a scale and a shift. For the example of @YuxuanSnow the scale and shift are not chosen appropriately. We quickly described our workflow in the Appendix Section 4. In practice, it means that you have to try interatively try configurations of scale and shifts, and pick the visually best one. Hope that helps. Feel free to ask if you have questions.
Hi
Thank you very much for your reply. Unfortunately, I am more of a 3d modeler than a programmer or researcher, so even going by trial and error is extremely complicated for me. Would it be possible to have a basic equation where it would be easy to intervene in the scaling and shift factors? Simple a/(depth+b) don't seems to work
I understand that it transcends the purpose of the research, but it would then be great in the future to have a built-in tool with which to play with parameters while viewing the point cloud, and then export a linearized depth map.
Thanks in advance
Hello,
Marigold predicts affine-invariant (linear) depth, which can be written as $d' = d * s + t$, where $d'$ is the true metric depth you would like to get from prediction $d$, $s$ is the scale and $t$ is the shift. Note that our prediction is affine-invariant directly in linear space, instead of in inverse depth space (or so-called disparity) like MiDaS.
Best.
thank you very much, i was able to get it working in my 3d editor. The amount of information you can retrieve is impressive. Even on a frontal human face, it can retrieve a profile that is not perfect, but functional for quick retouching. Really amazing.
@coccofresco Thank you! If the data and time permit, could you please showcase some of the results on Twitter and refer our original announcement post? https://twitter.com/AntonObukhov1/status/1732946419663667464?t=8iIVVDbbrhwGHQgHNOoj2A&s=19
affine invariant depth
Thanks for the great work.
I was wondering if I have some prior knowledges on the scale and location (both object & camera) of the object I want predict (e.g. a rough point cloud, estimated 3D bounding-box, etc.), will it be possible to adjust Marigold prediction?
Also I wonder if the training script will be released in the future for fine-tuning the model.
@YuxuanSnow Could you please share the snippet code that you used for point cloud generation and visualization? Thanks in advance
Hi,
nice work! I always want to compare the predicted depth with the colorful unprojected point cloud. I compared Marigold, ZoeDepth, OmniDatav2. I tried the following image.
Marigold:
OmniDatav2:
ZoeN:
It's interesting that previous method predicts more likely to a flat geometry, while Marigold can preserve it better :-)
Hello, Can you share how to generate the point clouds from the depth maps? And also what tool are you using to visualize the 3d point cloud?