ConsistentTeacher My results are lower than yours.

When I train the model with consistent_teacher_r50_fpn_coco_180k_10p_2x8.py on one GPU, the result is too low. And I didn't change the parameters. `[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 43.4 task/s, elapsed: 115s, ETA: 0s2023-05-20 11:33:56,083 - mmdet.ssod - INFO - Evaluating bbox... Loading and preparing results... DONE (t=1.21s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=17.84s). Accumulating evaluation results... DONE (t=5.17s). 2023-05-20 11:34:22,052 - mmdet.ssod - INFO - Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.123 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.197 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.126 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.067 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.142 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.156 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.342 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.342 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.342 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.146 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.359 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.496

[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 43.1 task/s, elapsed: 116s, ETA: 0s2023-05-20 11:36:22,808 - mmdet.ssod - INFO - Evaluating bbox... Loading and preparing results... DONE (t=1.24s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=15.87s). Accumulating evaluation results... DONE (t=6.67s). 2023-05-20 11:36:48,355 - mmdet.ssod - INFO - Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.098 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.165 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.099 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.051 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.113 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.125 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.305 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.305 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.305 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.131 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.315 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.446

2023-05-20 11:36:48,895 - mmdet.ssod - INFO - Exp name: consistent_teacher_r50_fpn_coco_180k_10p_2x8.py 2023-05-20 11:36:48,898 - mmdet.ssod - INFO - Iter(val) [180000] teacher.bbox_mAP: 0.1230, teacher.bbox_mAP_50: 0.1971, teacher.bbox_mAP_75: 0.1262, teacher.bbox_mAP_s: 0.0674, teacher.bbox_mAP_m: 0.1415, teacher.bbox_mAP_l: 0.1562, teacher.bbox_mAP_copypaste: 0.1230 0.1971 0.1262 0.0674 0.1415 0.1562, student.bbox_mAP: 0.0984, student.bbox_mAP_50: 0.1653, student.bbox_mAP_75: 0.0991, student.bbox_mAP_s: 0.0513, student.bbox_mAP_m: 0.1129, student.bbox_mAP_l: 0.1249, student.bbox_mAP_copypaste: 0.0984 0.1653 0.0991 0.0513 0.1129 0.1249 wandb: Waiting for W&B process to finish... (success).`

May 20 '23 03:05 yuan738

Sorry, this config is supposed to be run on 8 GPUs with 2x8 meaning samples_per_gpu=2 and the total number of GPUs = 8.

May 20 '23 03:05 Johnson-Wang

Thanks, is there any config supposed to be run on one GPU? Or, what parameters should I changed in the config? I have changed the parameter —gups and --gpu-ids in the tools/train.py to set the one GPU. Thanks!

May 20 '23 03:05 yuan738

Dear @yuan738,

Thank you for your insightful question. Currently, the semi-supervised method is heavily dependent on a large batch size, and as a result, reducing the number of GPUs could significantly impact performance. Unfortunately, we have not yet found an effective solution to this issue.

One potential workaround could be to implement "gradient accumulation," or you might consider using fp16 to increase the batch size for a single GPU. Moreover, given the limitations of single GPU training, you might find that a 1:4 label-to-unlabel ratio is too ambitious and could consider adjusting it to a 1:1 ratio. However, please note that the performance might still be subpar with a single GPU setup. We acknowledge this challenge and will strive to address it in our future work.

Best Regards,

May 24 '23 02:05 Adamdad

hi @Adamdad i train 'consistent_teacher_r50_fpn_coco_180k_10p.py' config using 4 GPUs, at iteration 16000, the result as below

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.190
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.133
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.143
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.145
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.439

it seems the mAP is too low.

May 24 '23 03:05 zimenglan-sysu-512

Dear @zimenglan-sysu-512 ,

I recommend increasing the ratio of labeled to unlabeled samples, such as a 1:1 ratio. Currently, the learning rate is based on 8 labeled samples (1 labeled sample per GPU). If you are training with fewer GPUs, you may need to adjust the batch size for labeled samples or decrease the learning rate to match your setup.

It's important to keep in mind that performance may still be lower with a reduced GPU setup. We are aware of this challenge and will make efforts to overcome it in our future endeavors.

Best.

May 24 '23 03:05 Adamdad

hi @Adamdad should i modify data.samples_per_gpu and data.sampler.train.sample_ratio to increase the labeled samples for 4 GPUs training and reduce the learning rate? e.g.

data.samples_per_gpu=6
data.sampler.train.sample_ratio=[2, 4]
lr = 0.01 * (4 * 6) / (5 * 8)

May 24 '23 04:05 zimenglan-sysu-512

Dear @zimenglan-sysu-512,

Yes, setting data.sampler.train.sample_ratio to [2, 4] or [3, 3] should work well. The first number represents the number of labeled samples, while the second number represents the number of unlabeled samples. Therefore, you will be using 2 or 3 labeled samples per batch per GPU. Make sure

data.samples_per_gpu = sum(data.sampler.train.sample_ratio).

Best,

May 24 '23 04:05 Adamdad

hi @Adamdad it seems that this config file is not found ..

May 25 '23 04:05 zimenglan-sysu-512

We only has a file called configs/consistent-teacher/consistent_teacher_r50_fpn_coco_720k_fulldata.py. No config provided for 360k training for full data.

May 25 '23 04:05 Adamdad

We only has a file called configs/consistent-teacher/consistent_teacher_r50_fpn_coco_720k_fulldata.py. No config provided for 360k training for full data.

thanks, another question, where to find configs/consistent-teacher/base.py file?

May 25 '23 07:05 zimenglan-sysu-512

Dear @zimenglan-sysu-512,

Apologies for any confusion caused. I wanted to inform you that the file configs/consistent-teacher/base.py has been renamed to configs/consistent-teacher/consistent_teacher_r50_fpn_coco_180k_10p.py. I have also updated this change in the README.

Best regards,

May 25 '23 07:05 Adamdad

hi @Adamdad two questions here:

from the log file, learning rate is keeping the same through the training phrase, why?
what is the difference between the 36w and 72 iterations? how about the performance of mAP? since only have few gpus e.g. 4, training 72w iterations takes more than 10 days to finishing even using fp16.

May 26 '23 01:05 zimenglan-sysu-512

Dear @zimenglan-sysu-512,

In our experiment, we decided not to decay the learning rate. Surprisingly, we observed that using a fixed learning rate resulted in higher performance compared to using a learning rate decay scheduling.
Unfortunately, we did not conduct the 360k iteration experiments, so we cannot provide insights into the performance gap between the two training times. However, it's worth noting that training the model for 72k iterations is indeed quite time-consuming. Even with the utilization of 8xV100 GPUs, it still takes several days to complete.

If you have any further questions or need additional information, please let me know.

Best regards,

May 26 '23 07:05 Adamdad

Dear @zimenglan-sysu-512,

Yes, setting data.sampler.train.sample_ratio to [2, 4] or [3, 3] should work well. The first number represents the number of labeled samples, while the second number represents the number of unlabeled samples. Therefore, you will be using 2 or 3 labeled samples per batch per GPU. Make sure

data.samples_per_gpu = sum(data.sampler.train.sample_ratio).

Best,

using 4 GPUs, and batch size is 6 in which sample_ratio = [2, 4], the result is below:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.381
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.542
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.409
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.211
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.484
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.343
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.609
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.740

May 28 '23 12:05 zimenglan-sysu-512

Dear @zimenglan-sysu-512 The results you provided are amazing. It would be very helpful if you could share the config and checkpoint. This experiment could be extremely useful for people with less GPUs. You can start a pull request or send the file to me.

Great 😃

May 28 '23 12:05 Adamdad

Dear @zimenglan-sysu-512 The results you provided are amazing. It would be very helpful if you could share the config and checkpoint. This experiment could be extremely useful for people with less GPUs. You can start a pull request or send the file to me.

Great 😃 hi @Adamdad , i will send the config, log and checkpoint files to u. please check the qq email.

May 29 '23 05:05 zimenglan-sysu-512

hi @Adamdad why in the fulldata config, the backbone r50 freeze the BN layer, but the 10% config does not ?

May 30 '23 06:05 zimenglan-sysu-512

dear @zimenglan-sysu-512 could you share your config file with me? It will be of great help to me, I only have 2 GPUs and get a bad training results. my qq mail is [email protected]

Jan 11 '24 03:01 cyn-liu

dear @zimenglan-sysu-512 could you share your config file with me? It will be of great help to me, my qq mail is [email protected]

May 22 '24 08:05 xiaofu3322

ConsistentTeacher ConsistentTeacher copied to clipboard

My results are lower than yours.

ConsistentTeacher
ConsistentTeacher copied to clipboard