Grounded_3D-LLM icon indicating copy to clipboard operation
Grounded_3D-LLM copied to clipboard

Explanation for data format and issues about data generation

Open Germany321 opened this issue 6 months ago • 2 comments

Thanks for your insteresting work. I visualize the grounded scene caption data and notice there is a key called 'all_phrases_positions'. What does it mean? I guess the numerical values represent the index after tokenizing the text prompt and you will replace the text embeddings in the corresponding indexes with the object tokens. Another question is how can you define the range of the place holder since there will be some adjective words such as ' table' or 'a chair <with four legs'?

Germany321 avatar Aug 20 '24 07:08 Germany321