Chuofan Ma comments

Results 18 comments of


                                            Chuofan Ma

Finetuning and dataset formatting guidelines

Hi there, thanks for your interest in our work. Here are some tips you may follow to finetune the model on customized datasets: 1. Format your data. There are various...

unable to load local weight

Thanks for the feedback. The bug occurs as the program is looking for a local DINOv2-L checkpoint to initialize CustomDDETRModel. This is not an expected behavior. A quick fix is...

unable to load local weight

We inherited the mmcv folder from GPT4ROI. I think it is originated from mmcv==1.4.7.

No groma conversation template

Sorry, it's a typo here. It should be `conv_temp='llava'`. Thanks for your feedback.

what user_query can i use?

You can simply use `Locate a person wearing a yellow hat in the image.` or something else like that. Just remember to enclose the referring expression with `` and ``.

what user_query can i use?

Yes, such hallucination is probably caused by training data - for grounding training, we only got positive QA pairs, i.e., the object mentioned in the question is guaranteed to occur...

Using Groma to generate region descriptions

Thanks for your interest in our work. For referring queries, you do not need to replace `` or `` with other texts. Actually, they are special placeholders (as defined in...

Why not use the text guidance for the OV-LVIS setting in the config?

Yes, we do not use text guidance for the OV-LVIS setting by default, because we observe text guidance has little impact in this case. As discussed in the paper, COCO-Caption...

Why not use the text guidance for the OV-LVIS setting in the config?

Actually, I only got 2.8M images for the CC3M dataset. But the LVIS APr metric typically has high variance. Maybe you can have another try to see if the results...

Why not use the text guidance for the OV-LVIS setting in the config?

Hi @kinredon, I cannot clearly remember the number, it roughly takes 8 V100s to train for 3-5 days. The transfer experiments are based on R50 backbone.