Co-DETR icon indicating copy to clipboard operation
Co-DETR copied to clipboard

Will you provide the training config file for ViT-L (66.0 AP)?

Open zhangchbin opened this issue 1 year ago • 11 comments

Thanks for your help.

zhangchbin avatar Aug 09 '23 11:08 zhangchbin

The number of encoder layer number and decoder layer in Co-DINO for 66.0AP is 6? these is no any description about this part in the paper.

HITerStudy avatar Aug 10 '23 02:08 HITerStudy

Hi, @zhangchbin, we have no plan to do this now. But we have updated the arxiv paper to release more details about this large model.

TempleX98 avatar Aug 10 '23 05:08 TempleX98

@HITerStudy We find the performance saturates when using more than 6 encoder or decoder layers for larger models (e.g., Swin-L). So we use 6 layers by default.

TempleX98 avatar Aug 10 '23 05:08 TempleX98

@HITerStudy We find the performance saturates when using more than 6 encoder or decoder layers for larger models (e.g., Swin-L). So we use 6 layers by default.

Thanks for your reply!

HITerStudy avatar Aug 10 '23 06:08 HITerStudy

@TempleX98 Hi, I encountered the error Co-DETR/mmdet/datasets/builder.py", line 80, in build_dataset dataset = MultiImageMixDataset(**cp_cfg) TypeError: __init__() got an unexpected keyword argument 'filter_empty_gt' , when I use config projects/configs/co_dino/co_dino_5scale_lsj_swin_large_3x_coco.py.

zhangchbin avatar Aug 10 '23 16:08 zhangchbin

@zhangchbin, I have fixed it

TempleX98 avatar Aug 10 '23 16:08 TempleX98

Amazing, it will spend 12 days training co_dino_5scale_lsj_swin_large_3x_coco.py with 8 GPUs. It is so strange because the whole dataset has double size samples (14786 vs 7000+). mmdet - INFO - Epoch [1][50/14786] lr: 2.000e-05, eta: 12 days, 17:17:46

zhangchbin avatar Aug 10 '23 17:08 zhangchbin

The eta time is inaccurate in the beginning. You can use DETR aug config if you want to accelerate training as it's faster than LSJ aug. Besides, you better use 16 GPUs (1 image per GPU) for Co-DINO w/ SwinL training.

TempleX98 avatar Aug 10 '23 18:08 TempleX98

how to implement the TTA used for the ViT-L(66.0AP), could you describe some details? Thank you.

HITerStudy avatar Aug 18 '23 02:08 HITerStudy

Hi, @zhangchbin, we have no plan to do this now. But we have updated the arxiv paper to release more details about this large model.

请问ViT-L (66.0 AP)的模型的backbone是用的eva02给出的eva02_L_pt_m38m_p14to16 | 304M | Merged-38M | 56这个预训练模型嘛。还有就是8卡40G A100可以完成在Objects365和coco的训练嘛,万分感谢!

RicoJYang avatar Aug 28 '23 03:08 RicoJYang

Hi, @zhangchbin, we have no plan to do this now. But we have updated the arxiv paper to release more details about this large model.

请问ViT-L (66.0 AP)的模型的backbone是用的eva02给出的eva02_L_pt_m38m_p14to16 | 304M | Merged-38M | 56这个预训练模型嘛。还有就是8卡40G A100可以完成在Objects365和coco的训练嘛,万分感谢!

hi V100能跑吗

zimenglan-sysu-512 avatar Oct 08 '23 09:10 zimenglan-sysu-512