Groma icon indicating copy to clipboard operation
Groma copied to clipboard

Referring multiple regions in the image

Open Deepayan137 opened this issue 1 year ago • 3 comments

Hi,

Thank you for your excellent work. I have been playing with the run_grom.py file and was wondering whether it is possible to provide multiple region bounding boxes to the model and ask it to describe them together. I was looking at the qualitative examples, and we can provide only one bounding box as an input to the model. Can you please tell me whether we can provide multiple region bounding boxes as an input, and if we can, can you provide a short example of how to do it?

Thank you

Deepayan137 avatar Nov 12 '24 10:11 Deepayan137

Yes, this framework theoretically supports multiple referring regions as input. For example, you can do this by prompting the model with Please briefly describe <roi><refer_box></roi> <refer_feat> and <roi><refer_box></roi> <refer_feat> and setting the box coordinates here.

However, it is possible that you get unexpected answers. This is because the provided model has not been trained on data with multiple referring regions as input. Anyway, feel free to have a try.

machuofan avatar Nov 15 '24 07:11 machuofan

Thank you for the reply. So if we refer to multiple regions then we pass a list of tensor (normalized bounding box co-ordinates)?

Deepayan137 avatar Nov 21 '24 09:11 Deepayan137

Thank you for the reply. So if we refer to multiple regions then we pass a list of tensor (normalized bounding box co-ordinates)?

Hello, I have a similar question. So, if we are referring to multiple regions, do we pass a list of tensors containing the normalized bounding box coordinates?

LLH-Harward avatar Dec 31 '24 06:12 LLH-Harward