Groma Referring multiple regions in the image

Hi,

Thank you for your excellent work. I have been playing with the run_grom.py file and was wondering whether it is possible to provide multiple region bounding boxes to the model and ask it to describe them together. I was looking at the qualitative examples, and we can provide only one bounding box as an input to the model. Can you please tell me whether we can provide multiple region bounding boxes as an input, and if we can, can you provide a short example of how to do it?

Thank you

Nov 12 '24 10:11 Deepayan137

Yes, this framework theoretically supports multiple referring regions as input. For example, you can do this by prompting the model with Please briefly describe <roi><refer_box></roi> <refer_feat> and <roi><refer_box></roi> <refer_feat> and setting the box coordinates here.

However, it is possible that you get unexpected answers. This is because the provided model has not been trained on data with multiple referring regions as input. Anyway, feel free to have a try.

Nov 15 '24 07:11 machuofan

Thank you for the reply. So if we refer to multiple regions then we pass a list of tensor (normalized bounding box co-ordinates)?

Nov 21 '24 09:11 Deepayan137

Thank you for the reply. So if we refer to multiple regions then we pass a list of tensor (normalized bounding box co-ordinates)?

Hello, I have a similar question. So, if we are referring to multiple regions, do we pass a list of tensors containing the normalized bounding box coordinates?

Dec 31 '24 06:12 LLH-Harward