ConsistentTeacher
ConsistentTeacher copied to clipboard
My results are lower than yours.
When I train the model with consistent_teacher_r50_fpn_coco_180k_10p_2x8.py on one GPU, the result is too low. And I didn't change the parameters. `[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 43.4 task/s, elapsed: 115s, ETA: 0s2023-05-20 11:33:56,083 - mmdet.ssod - INFO - Evaluating bbox... Loading and preparing results... DONE (t=1.21s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=17.84s). Accumulating evaluation results... DONE (t=5.17s). 2023-05-20 11:34:22,052 - mmdet.ssod - INFO - Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.123 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.197 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.126 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.067 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.142 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.156 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.342 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.342 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.342 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.146 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.359 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.496
[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 43.1 task/s, elapsed: 116s, ETA: 0s2023-05-20 11:36:22,808 - mmdet.ssod - INFO - Evaluating bbox... Loading and preparing results... DONE (t=1.24s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=15.87s). Accumulating evaluation results... DONE (t=6.67s). 2023-05-20 11:36:48,355 - mmdet.ssod - INFO - Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.098 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.165 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.099 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.051 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.113 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.125 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.305 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.305 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.305 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.131 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.315 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.446
2023-05-20 11:36:48,895 - mmdet.ssod - INFO - Exp name: consistent_teacher_r50_fpn_coco_180k_10p_2x8.py 2023-05-20 11:36:48,898 - mmdet.ssod - INFO - Iter(val) [180000] teacher.bbox_mAP: 0.1230, teacher.bbox_mAP_50: 0.1971, teacher.bbox_mAP_75: 0.1262, teacher.bbox_mAP_s: 0.0674, teacher.bbox_mAP_m: 0.1415, teacher.bbox_mAP_l: 0.1562, teacher.bbox_mAP_copypaste: 0.1230 0.1971 0.1262 0.0674 0.1415 0.1562, student.bbox_mAP: 0.0984, student.bbox_mAP_50: 0.1653, student.bbox_mAP_75: 0.0991, student.bbox_mAP_s: 0.0513, student.bbox_mAP_m: 0.1129, student.bbox_mAP_l: 0.1249, student.bbox_mAP_copypaste: 0.0984 0.1653 0.0991 0.0513 0.1129 0.1249 wandb: Waiting for W&B process to finish... (success).`
Sorry, this config is supposed to be run on 8 GPUs with 2x8
meaning samples_per_gpu=2 and the total number of GPUs = 8.
Thanks, is there any config supposed to be run on one GPU? Or, what parameters should I changed in the config?
I have changed the parameter —gups
and --gpu-ids
in the tools/train.py to set the one GPU.
Thanks!
Dear @yuan738,
Thank you for your insightful question. Currently, the semi-supervised method is heavily dependent on a large batch size, and as a result, reducing the number of GPUs could significantly impact performance. Unfortunately, we have not yet found an effective solution to this issue.
One potential workaround could be to implement "gradient accumulation," or you might consider using fp16
to increase the batch size for a single GPU. Moreover, given the limitations of single GPU training, you might find that a 1:4
label-to-unlabel ratio is too ambitious and could consider adjusting it to a 1:1
ratio. However, please note that the performance might still be subpar with a single GPU setup. We acknowledge this challenge and will strive to address it in our future work.
Best Regards,
hi @Adamdad i train 'consistent_teacher_r50_fpn_coco_180k_10p.py' config using 4 GPUs, at iteration 16000, the result as below
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.113
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.190
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.113
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.063
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.133
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.143
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.323
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.323
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.323
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.145
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.346
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.439
it seems the mAP is too low.
Dear @zimenglan-sysu-512 ,
I recommend increasing the ratio of labeled to unlabeled samples, such as a 1:1
ratio. Currently, the learning rate is based on 8 labeled samples (1 labeled sample per GPU)
. If you are training with fewer GPUs, you may need to adjust the batch size for labeled samples or decrease the learning rate to match your setup.
It's important to keep in mind that performance may still be lower with a reduced GPU setup. We are aware of this challenge and will make efforts to overcome it in our future endeavors.
Best.
hi @Adamdad
should i modify data.samples_per_gpu
and data.sampler.train.sample_ratio
to increase the labeled samples for 4 GPUs training and reduce the learning rate?
e.g.
data.samples_per_gpu=6
data.sampler.train.sample_ratio=[2, 4]
lr = 0.01 * (4 * 6) / (5 * 8)
Dear @zimenglan-sysu-512,
Yes, setting data.sampler.train.sample_ratio to [2, 4]
or [3, 3]
should work well. The first number represents the number of labeled samples, while the second number represents the number of unlabeled samples. Therefore, you will be using 2
or 3
labeled samples per batch per GPU. Make sure
data.samples_per_gpu = sum(data.sampler.train.sample_ratio)
.
Best,
hi @Adamdad
it seems that this config file is not found ..
We only has a file called configs/consistent-teacher/consistent_teacher_r50_fpn_coco_720k_fulldata.py
. No config provided for 360k training for full data.
We only has a file called
configs/consistent-teacher/consistent_teacher_r50_fpn_coco_720k_fulldata.py
. No config provided for 360k training for full data.
thanks, another question, where to find configs/consistent-teacher/base.py
file?
Dear @zimenglan-sysu-512,
Apologies for any confusion caused. I wanted to inform you that the file configs/consistent-teacher/base.py
has been renamed to configs/consistent-teacher/consistent_teacher_r50_fpn_coco_180k_10p.py
. I have also updated this change in the README.
Best regards,
hi @Adamdad two questions here:
- from the log file, learning rate is keeping the same through the training phrase, why?
- what is the difference between the 36w and 72 iterations? how about the performance of mAP? since only have few gpus e.g. 4, training 72w iterations takes more than 10 days to finishing even using fp16.
Dear @zimenglan-sysu-512,
- In our experiment, we decided not to decay the learning rate. Surprisingly, we observed that using a fixed learning rate resulted in higher performance compared to using a learning rate decay scheduling.
- Unfortunately, we did not conduct the 360k iteration experiments, so we cannot provide insights into the performance gap between the two training times. However, it's worth noting that training the model for 72k iterations is indeed quite time-consuming. Even with the utilization of 8xV100 GPUs, it still takes several days to complete.
If you have any further questions or need additional information, please let me know.
Best regards,
Dear @zimenglan-sysu-512,
Yes, setting data.sampler.train.sample_ratio to
[2, 4]
or[3, 3]
should work well. The first number represents the number of labeled samples, while the second number represents the number of unlabeled samples. Therefore, you will be using2
or3
labeled samples per batch per GPU. Make sure
data.samples_per_gpu = sum(data.sampler.train.sample_ratio)
.Best,
using 4 GPUs, and batch size is 6 in which sample_ratio = [2, 4]
, the result is below:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.381
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.542
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.409
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.211
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.411
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.484
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.570
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.570
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.570
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.343
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.609
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.740
Dear @zimenglan-sysu-512 The results you provided are amazing. It would be very helpful if you could share the config and checkpoint. This experiment could be extremely useful for people with less GPUs. You can start a pull request or send the file to me.
Great 😃
Dear @zimenglan-sysu-512 The results you provided are amazing. It would be very helpful if you could share the config and checkpoint. This experiment could be extremely useful for people with less GPUs. You can start a pull request or send the file to me.
Great 😃 hi @Adamdad , i will send the config, log and checkpoint files to u. please check the qq email.
hi @Adamdad why in the fulldata config, the backbone r50 freeze the BN layer, but the 10% config does not ?
dear @zimenglan-sysu-512 could you share your config file with me? It will be of great help to me, I only have 2 GPUs and get a bad training results. my qq mail is [email protected]
dear @zimenglan-sysu-512 could you share your config file with me? It will be of great help to me, my qq mail is [email protected]