Groma icon indicating copy to clipboard operation
Groma copied to clipboard

Using Groma to generate region descriptions

Open Marloweeee opened this issue 1 year ago • 1 comments

Thanks for your great work! I also want to learn more details about using groma.eval.run_groma to generate region description. As you mentioned in #20 ,For user prompt, you can follow this format:

Please briefly describe <roi><refer_box></roi> <refer_feat>.

Does <refer_box> represent the normalized coordinates of bbox, such as [0.2752, 0.3756, 0.4713, 0.7727]? And what does <refer_feat> mean? Based on the above speculation (only using the normalized coordinates to replace the placeholder <refer_box> and omitting <refer_feat>), I use the following prompt query for region description generation to generate the expression of Trump:

r60

'Please briefly describe <roi> [0.2752, 0.3756, 0.4713, 0.7727] </roi>.'

but each time additional region positioning is obtained, and each description is based on this region positioning:

tensor([[ 3.2769e-04, -7.0846e-04, 3.3295e-01, 9.9904e-01]]) <roi> <r54> </roi> meets with a man in a suit</s>

This phenomenon occurs only when the region description is generated, and when I exploit the location instruction, the correct location region can be returned:

query: 'Locate <p> a person with a tie </p> in the image'

output:tensor([[0.2751, 0.3756, 0.4713, 0.7727]]) <roi> <r21> </roi> </s>

I am very confused about these questions. Thank you very much for your answers

Marloweeee avatar Sep 27 '24 02:09 Marloweeee

Thanks for your interest in our work. For referring queries, you do not need to replace <refer_box> or <refer_feat> with other texts. Actually, they are special placeholders (as defined in groma/constants.py) that will be automatically handled by the model. You should directly set box coordinates here.

I will later update run_groma.py to provide an interface to pass in user-specified boxes.

machuofan avatar Sep 27 '24 10:09 machuofan

Thanks for your answer, now I am able to generate a more detailed description of the region using GROMA, thank you again for your excellent work^_^

Marloweeee avatar Nov 04 '24 06:11 Marloweeee

Thanks for your interest in our work. For referring queries, you do not need to replace <refer_box> or <refer_feat> with other texts. Actually, they are special placeholders (as defined in groma/constants.py) that will be automatically handled by the model. You should directly set box coordinates here.感谢您对我们工作的兴趣。对于引用查询,您不需要将<refer_box><refer_feat>替换为其他文本。实际上,它们是特殊的占位符(如groma/constants.py中定义),将由模型自动处理。您应该直接在此处设置框坐标。

I will later update run_groma.py to provide an interface to pass in user-specified boxes.我稍后将更新run_groma.py以提供一个接口来传递用户指定的框。

Thanks for your great work! Has this interface been updated?

LLH-Harward avatar Dec 31 '24 06:12 LLH-Harward