KaKa-101

Results 2 issues of KaKa-101

Could you tell me how many tokens you used to represent the whole 3D scene before sending them to LLM? Thanks a lot.

Thanks for your great work, which is significant for this area. Could you please provide the visualization tool for Figure 3 ? I am really interested in the visual results...