Grounded_3D-LLM
Grounded_3D-LLM copied to clipboard
Explanation for data format and issues about data generation
Thanks for your insteresting work. I visualize the grounded scene caption data and notice there is a key called 'all_phrases_positions'. What does it mean? I guess the numerical values represent the index after tokenizing the text prompt and you will replace the text embeddings in the corresponding indexes with the object tokens. Another question is how can you define the range of the place holder since there will be some adjective words such as ' table' or 'a chair <with four legs'?