GroundingDINO icon indicating copy to clipboard operation
GroundingDINO copied to clipboard

【Feature】MMDetection supports Grounding-DINO inference and fine-tuning

Open hhaAndroid opened this issue 2 years ago • 5 comments

Hi All: MMDetection supports Grounding-DINO inference and fine-tuning for now. The mAP we achieved in our reproduction is higher than the official results. We also provide the results of retraining the R50 model from scratch, which exhibits significantly higher performance than the official implementation.

Installation

cd $MMDETROOT

# source installation
pip install -r requirements/multimodal.txt

# or mim installation
mim install mmdet[multimodal]

NOTE

Grounding DINO utilizes BERT as the language model, which requires access to https://huggingface.co/. If you encounter connection errors due to network access, you can download the required files on a computer with internet access and save them locally. Finally, modify the lang_model_name field in the config to the local path. Please refer to the following code:

from transformers import BertConfig, BertModel
from transformers import AutoTokenizer

config = BertConfig.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased", add_pooling_layer=False, config=config)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

config.save_pretrained("your path/bert-base-uncased")
model.save_pretrained("your path/bert-base-uncased")
tokenizer.save_pretrained("your path/bert-base-uncased")

Inference

cd $MMDETROOT

wget https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth

python demo/image_demo.py \
	demo/demo.jpg \
	configs/grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py \
	--weights groundingdino_swint_ogc_mmdet-822d7e9d.pth \
	--texts 'bench . car .'

Results and Models

Model Backbone Style COCO mAP Official COCO mAP Pre-Train Data
Grounding DINO-T Swin-T Zero-shot 48.5 48.4 O365,GoldG,Cap4M
Grounding DINO-T Swin-T Finetune 58.1(+0.9) 57.2 O365,GoldG,Cap4M
Grounding DINO-B Swin-B Zero-shot 56.9 56.7 COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO
Grounding DINO-B Swin-B Finetune 59.7 COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO
Grounding DINO-R50 R50 Scratch 48.9(+0.8) 48.1

Details for https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/grounding_dino/README.md

And we also support GLIP inference and fine-tuning

If you encounter any issues while using it, please feel free to create an issue.

hhaAndroid avatar Sep 26 '23 03:09 hhaAndroid

@hhaAndroid thank you very much for supporting Grounding DINO finetuning! I just have a few questions:

my goal is to maintain Grounding DINO's versatility in open-set detection but just try to add a few custom classes

  1. in this finetuning procedure from the MMDetection docs, it looks like we have to explicitly set the number of classes. does this mean the finetuned model can no longer do open-set detection? or am I misunderstanding something?
  2. will the finetuned model still be able to handle Referring Expression Comprehension (REC)? for example, can I still prompt the finetuned model with "the left lion"?
  3. could you please share any script or code snippets on how you achieved the finetuning?

Many thanks!

PawaritL avatar Oct 08 '23 10:10 PawaritL

@hhaAndroid thank you very much for supporting Grounding DINO finetuning! I just have a few questions:

my goal is to maintain Grounding DINO's versatility in open-set detection but just try to add a few custom classes

  1. in this finetuning procedure from the MMDetection docs, it looks like we have to explicitly set the number of classes. does this mean the finetuned model can no longer do open-set detection? or am I misunderstanding something?
  2. will the finetuned model still be able to handle Referring Expression Comprehension (REC)? for example, can I still prompt the finetuned model with "the left lion"?
  3. could you please share any script or code snippets on how you achieved the finetuning?

Many thanks!

Maybe the text input of GroundingDINO in mmdet fixed categoly (not real text) 😥

FengheTan9 avatar Oct 08 '23 13:10 FengheTan9

If you encounter any issues while using it, please feel free to create an issue.

This is amazing, thank you!

Can those models be used with the base groundingdino implementation? the configs look quite different, so i guess not? Bummer to change the implementation at this point

Liquidmasl avatar Nov 15 '23 15:11 Liquidmasl

Can I finetune grounding dino on a prompt? The thing is that there should be these objects in pretraining data, but I would like to add some additional information to get better predictions. Let's say I only want to detect "black cats". The problem is that I have few data samples, so I would like to tune it a little bit with prompt to use pretrained knowledge.

25icecreamflavors avatar Dec 17 '23 23:12 25icecreamflavors

hi,What are the minimum equipment requirements of fine-tunning grounddino with coco dataset?(default batch-size=32)

SoulProficiency avatar Feb 27 '24 01:02 SoulProficiency