SegAnyGAussians Setting scale when evaluating the segmentation performance

Hi, thank you for interesting works! I have quesiton about setting a scale when evaluating the 3D segmentation.

In the evaluation code, it seems like it searches the segmented object only at single given scale (upper bound of mask scales).

(in provided eval_3dovs.ipynb notebook)

scale = upper_bound_scale
scale = torch.full((1,), scale).cuda()
scale = q_trans(scale)
gates = scale_gate(scale)

As it searches at single scale, it fails to segment small objects like egg in bowl in ramen scene of LeRF OVS dataset. Should I modify the code so that it searches through multiple scale and selects one with the best segmentation result?

May 21 '25 13:05 MinjiK11

This is a challenging issue. As discussed in the appendix of our paper, SAGA does encounter multi-scale ambiguity when applied to open-vocabulary segmentation in complex scenes such as Lerf-OVS ramen. It’s unclear whether searching across multiple scales would effectively address this problem. Even worse, we currently lack a reliable metric to determine the optimal scale for a given text prompt. Using ground truth annotations to guide this selection would introduce data leakage.

May 29 '25 06:05 Jumpat

@Jumpat I also show that 360_v2 garden images can not be extracted into all scale file of pt format at the mask_scale folder. During the run time, just drop it silently, and there are a few files of pt. So I cannot do the next step.

Please upgrade to solve this issue at your code.

And if you use torch version > 2.3.1, which be much better for building environment.
Your current version is too low and shows some issue about using torch.eig()[=>torch.linalg.eig] at your pca function.

Jun 08 '25 00:06 jeffhwang02

@Jumpat Thank you for your kind reply! I would like to ask another question about the performance of SAGA in different dataset. Do you think SAGA can work well in large-scale scene dataset like ScanNet (dataset of indoor scenes)? Personally, It seems it's hard for SAGA to learn the affinity feature of each Gaussian consistently for the large scenes.

Jun 12 '25 03:06 MinjiK11