LangSplat icon indicating copy to clipboard operation
LangSplat copied to clipboard

Querying the most relevant 3D Gaussians

Open xxlbigbrother opened this issue 1 year ago • 4 comments

I couldn't find any code for inputting text and querying the most relevant 3D Gaussians in the code repository. Will it be provided later?

xxlbigbrother avatar Mar 21 '24 09:03 xxlbigbrother

Thanks for your attention. The eval code has been released.

minghanqin avatar Mar 27 '24 09:03 minghanqin

Thanks for your attention. The eval code has been released.

Thank you for your quick code update, but I found that the ground truth of lerf_ovs is on 2d, so how can we achieve 3D Object Localization? I still don’t know how to query the original 3D gaussian points. Thank you for your help.

xxlbigbrother avatar Apr 01 '24 02:04 xxlbigbrother

Thank you for your attention to our work.  To achieve 3D text querying, there can be two approaches. The first method, as you mentioned, directly computes the similarity between 3D Gaussian points and text queries. The second method first renders 3D language Gaussian onto a 2D image plane using Gaussian Splatting, then computing similarity between text queries and language features on the 2D image pixels.  Previous SOTA works like LERF adopted the second method because NeRF's implicit modeling prevented the use of the first method. To ensure a fair comparison, we also employed the second method. However, our approach can indeed be tested using the first method, and we will explore it in the future to see if it yields better performance.  I hope this explanation addresses your questions.

Li-Wanhua avatar Apr 01 '24 02:04 Li-Wanhua

Thanks for your kind help!

xxlbigbrother avatar Jun 19 '24 03:06 xxlbigbrother