Using Groma to generate region descriptions
Thanks for your great work! I also want to learn more details about using groma.eval.run_groma to generate region description. As you mentioned in #20 ,For user prompt, you can follow this format:
Please briefly describe <roi><refer_box></roi> <refer_feat>.
Does <refer_box> represent the normalized coordinates of bbox, such as [0.2752, 0.3756, 0.4713, 0.7727]? And what does <refer_feat> mean? Based on the above speculation (only using the normalized coordinates to replace the placeholder <refer_box> and omitting <refer_feat>), I use the following prompt query for region description generation to generate the expression of Trump:
'Please briefly describe <roi> [0.2752, 0.3756, 0.4713, 0.7727] </roi>.'
but each time additional region positioning is obtained, and each description is based on this region positioning:
tensor([[ 3.2769e-04, -7.0846e-04, 3.3295e-01, 9.9904e-01]]) <roi> <r54> </roi> meets with a man in a suit</s>
This phenomenon occurs only when the region description is generated, and when I exploit the location instruction, the correct location region can be returned:
query: 'Locate <p> a person with a tie </p> in the image'
output:tensor([[0.2751, 0.3756, 0.4713, 0.7727]]) <roi> <r21> </roi> </s>
I am very confused about these questions. Thank you very much for your answers
Thanks for your interest in our work. For referring queries, you do not need to replace <refer_box> or <refer_feat> with other texts. Actually, they are special placeholders (as defined in groma/constants.py) that will be automatically handled by the model. You should directly set box coordinates here.
I will later update run_groma.py to provide an interface to pass in user-specified boxes.
Thanks for your answer, now I am able to generate a more detailed description of the region using GROMA, thank you again for your excellent work^_^
Thanks for your interest in our work. For referring queries, you do not need to replace
<refer_box>or<refer_feat>with other texts. Actually, they are special placeholders (as defined ingroma/constants.py) that will be automatically handled by the model. You should directly set box coordinates here.感谢您对我们工作的兴趣。对于引用查询,您不需要将<refer_box>或<refer_feat>替换为其他文本。实际上,它们是特殊的占位符(如groma/constants.py中定义),将由模型自动处理。您应该直接在此处设置框坐标。I will later update
run_groma.pyto provide an interface to pass in user-specified boxes.我稍后将更新run_groma.py以提供一个接口来传递用户指定的框。
Thanks for your great work! Has this interface been updated?