Open3DIS icon indicating copy to clipboard operation
Open3DIS copied to clipboard

Inconsistent number of frames in the given validation scannet200 scene and rendered scene.

Open Yebulabula opened this issue 1 year ago • 1 comments
trafficstars

Dear author,

Sorry to bother you again. I am writing to seek clarification about the frame count in the validation ScanNet200 example, which totals 475 frames. Following the standard ScanNet rendering process in "https://github.com/ScanNet/ScanNet/tree/master/SensReader/python", I generated RGB-D images from a .sens file, but ended up with a significantly higher frame count. Could you explain the discrepancy?

Thank you for your assistance.

Best regards, Ye Mao

Yebulabula avatar May 07 '24 15:05 Yebulabula

Hi @Yebulabula, We capture RGB-D frames at intervals of 5, indicating that we record and append the results to the self.frames list every 5 invocations of this class constructor: here

Best.

PhucNDA avatar May 07 '24 18:05 PhucNDA

Thanks for your quick reply. It is really helpful. Another question is that when I perform promptable segmentation, what is the most effective method to filter unnecessary 3D proposals? I found Open3DIS normally generates a bunch of masks for a single text prompt, but we only need one finally. I tried to select the 3D proposal with the highest CLIP confidence as the final mask. But this confidence value is not always reliable. Do you have some advice on it? Thanks.

Yebulabula avatar May 08 '24 17:05 Yebulabula

Additionally, I am confused about when we conduct promptable segmentation, do we really need CLIP for further class label assignment? Why don't we merge multiple 2D masks into a single 3D proposal, and use it as the final segmentation result?

Yebulabula avatar May 08 '24 18:05 Yebulabula

Hi @Yebulabula,

A1: For post-processing techniques, you can explore various NMS algorithms at ISBNet, filtering techniques at OVIR-3D, DBScan at Segment3D and many other heuristic algorithms...

A2: Certainly, you can utilize the lifted 2D masks from 2Dsegmenter as the final result for promptable segmentation. However, 2D masks derived from 2Dsegmenter are typically noisy (as can be seen by enabling this script), which might lead to unreliable results. For instance, when querying "Hoverboard" in a 3D scene, certain views may show the 2Dsegmenter incorrectly segmenting a chair, resulting in an inaccurate final result (in green version). Conversely, our method incrementally refines CLIP features to filter out false predictions, as demonstrated in the red version. This approach, using CLIP features, is developed by our VinAI-3DIS team OpenSUN3D

image image image If you have any question, feel free to let me know.

Best.

PhucNDA avatar May 08 '24 18:05 PhucNDA

If you have any question, feel free to re-open the issue

PhucNDA avatar May 11 '24 16:05 PhucNDA