Chuofan Ma
Chuofan Ma
Oh, I see. Then it should be A100 80G GPUs. Sorry for the mistake.
Hi there, thanks for your interest in our work. We have not yet implemented gradio demo for Groma. The gradio code was directly inherited from LLaVA. Therefore, you may have...
Hi, you can download the LVIS result file [here](https://huggingface.co/datasets/FoundationVision/groma_data/blob/main/lvis_test.json).
Hi there, I modified `sharegpt4v_instruct_gpt4-vision_cap100k_new.json` simply because several images (less than 10) have incorrect paths in the original json annotations. But for some reasons, I do not have access to...
Yes, this framework theoretically supports multiple referring regions as input. For example, you can do this by prompting the model with `Please briefly describe and ` and setting the box...
Yes, it looks good to me.
Hi there, thank you for your interest in our work. Yes, the classifier works in the same way as CLIP, i.e, the classifier weights are essentially composed of text embeddings.
It's 'a xxx'.