ConsistentTeacher icon indicating copy to clipboard operation
ConsistentTeacher copied to clipboard

My results are lower than yours.

Open yuan738 opened this issue 1 year ago • 19 comments

When I train the model with consistent_teacher_r50_fpn_coco_180k_10p_2x8.py on one GPU, the result is too low. And I didn't change the parameters. `[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 43.4 task/s, elapsed: 115s, ETA: 0s2023-05-20 11:33:56,083 - mmdet.ssod - INFO - Evaluating bbox... Loading and preparing results... DONE (t=1.21s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=17.84s). Accumulating evaluation results... DONE (t=5.17s). 2023-05-20 11:34:22,052 - mmdet.ssod - INFO - Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.123 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.197 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.126 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.067 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.142 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.156 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.342 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.342 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.342 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.146 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.359 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.496

[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 43.1 task/s, elapsed: 116s, ETA: 0s2023-05-20 11:36:22,808 - mmdet.ssod - INFO - Evaluating bbox... Loading and preparing results... DONE (t=1.24s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=15.87s). Accumulating evaluation results... DONE (t=6.67s). 2023-05-20 11:36:48,355 - mmdet.ssod - INFO - Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.098 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.165 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.099 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.051 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.113 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.125 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.305 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.305 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.305 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.131 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.315 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.446

2023-05-20 11:36:48,895 - mmdet.ssod - INFO - Exp name: consistent_teacher_r50_fpn_coco_180k_10p_2x8.py 2023-05-20 11:36:48,898 - mmdet.ssod - INFO - Iter(val) [180000] teacher.bbox_mAP: 0.1230, teacher.bbox_mAP_50: 0.1971, teacher.bbox_mAP_75: 0.1262, teacher.bbox_mAP_s: 0.0674, teacher.bbox_mAP_m: 0.1415, teacher.bbox_mAP_l: 0.1562, teacher.bbox_mAP_copypaste: 0.1230 0.1971 0.1262 0.0674 0.1415 0.1562, student.bbox_mAP: 0.0984, student.bbox_mAP_50: 0.1653, student.bbox_mAP_75: 0.0991, student.bbox_mAP_s: 0.0513, student.bbox_mAP_m: 0.1129, student.bbox_mAP_l: 0.1249, student.bbox_mAP_copypaste: 0.0984 0.1653 0.0991 0.0513 0.1129 0.1249 wandb: Waiting for W&B process to finish... (success).`

yuan738 avatar May 20 '23 03:05 yuan738

Sorry, this config is supposed to be run on 8 GPUs with 2x8 meaning samples_per_gpu=2 and the total number of GPUs = 8.

Johnson-Wang avatar May 20 '23 03:05 Johnson-Wang

Thanks, is there any config supposed to be run on one GPU? Or, what parameters should I changed in the config? I have changed the parameter —gups and --gpu-ids in the tools/train.py to set the one GPU. Thanks!

yuan738 avatar May 20 '23 03:05 yuan738

Dear @yuan738,

Thank you for your insightful question. Currently, the semi-supervised method is heavily dependent on a large batch size, and as a result, reducing the number of GPUs could significantly impact performance. Unfortunately, we have not yet found an effective solution to this issue.

One potential workaround could be to implement "gradient accumulation," or you might consider using fp16 to increase the batch size for a single GPU. Moreover, given the limitations of single GPU training, you might find that a 1:4 label-to-unlabel ratio is too ambitious and could consider adjusting it to a 1:1 ratio. However, please note that the performance might still be subpar with a single GPU setup. We acknowledge this challenge and will strive to address it in our future work.

Best Regards,

Adamdad avatar May 24 '23 02:05 Adamdad

hi @Adamdad i train 'consistent_teacher_r50_fpn_coco_180k_10p.py' config using 4 GPUs, at iteration 16000, the result as below

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.190
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.133
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.143
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.145
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.439

it seems the mAP is too low.

zimenglan-sysu-512 avatar May 24 '23 03:05 zimenglan-sysu-512

Dear @zimenglan-sysu-512 ,

I recommend increasing the ratio of labeled to unlabeled samples, such as a 1:1 ratio. Currently, the learning rate is based on 8 labeled samples (1 labeled sample per GPU). If you are training with fewer GPUs, you may need to adjust the batch size for labeled samples or decrease the learning rate to match your setup.

It's important to keep in mind that performance may still be lower with a reduced GPU setup. We are aware of this challenge and will make efforts to overcome it in our future endeavors.

Best.

Adamdad avatar May 24 '23 03:05 Adamdad

hi @Adamdad should i modify data.samples_per_gpu and data.sampler.train.sample_ratio to increase the labeled samples for 4 GPUs training and reduce the learning rate? e.g.

data.samples_per_gpu=6
data.sampler.train.sample_ratio=[2, 4]
lr = 0.01 * (4 * 6) / (5 * 8)

zimenglan-sysu-512 avatar May 24 '23 04:05 zimenglan-sysu-512

Dear @zimenglan-sysu-512,

Yes, setting data.sampler.train.sample_ratio to [2, 4] or [3, 3] should work well. The first number represents the number of labeled samples, while the second number represents the number of unlabeled samples. Therefore, you will be using 2 or 3 labeled samples per batch per GPU. Make sure

data.samples_per_gpu = sum(data.sampler.train.sample_ratio).

Best,

Adamdad avatar May 24 '23 04:05 Adamdad

hi @Adamdad image it seems that this config file is not found ..

zimenglan-sysu-512 avatar May 25 '23 04:05 zimenglan-sysu-512

We only has a file called configs/consistent-teacher/consistent_teacher_r50_fpn_coco_720k_fulldata.py. No config provided for 360k training for full data.

Adamdad avatar May 25 '23 04:05 Adamdad

We only has a file called configs/consistent-teacher/consistent_teacher_r50_fpn_coco_720k_fulldata.py. No config provided for 360k training for full data.

thanks, another question, where to find configs/consistent-teacher/base.py file?

zimenglan-sysu-512 avatar May 25 '23 07:05 zimenglan-sysu-512

Dear @zimenglan-sysu-512,

Apologies for any confusion caused. I wanted to inform you that the file configs/consistent-teacher/base.py has been renamed to configs/consistent-teacher/consistent_teacher_r50_fpn_coco_180k_10p.py. I have also updated this change in the README.

Best regards,

Adamdad avatar May 25 '23 07:05 Adamdad

hi @Adamdad two questions here:

  1. from the log file, learning rate is keeping the same through the training phrase, why?
  2. what is the difference between the 36w and 72 iterations? how about the performance of mAP? since only have few gpus e.g. 4, training 72w iterations takes more than 10 days to finishing even using fp16.

zimenglan-sysu-512 avatar May 26 '23 01:05 zimenglan-sysu-512

Dear @zimenglan-sysu-512,

  1. In our experiment, we decided not to decay the learning rate. Surprisingly, we observed that using a fixed learning rate resulted in higher performance compared to using a learning rate decay scheduling.
  2. Unfortunately, we did not conduct the 360k iteration experiments, so we cannot provide insights into the performance gap between the two training times. However, it's worth noting that training the model for 72k iterations is indeed quite time-consuming. Even with the utilization of 8xV100 GPUs, it still takes several days to complete.

If you have any further questions or need additional information, please let me know.

Best regards,

Adamdad avatar May 26 '23 07:05 Adamdad

Dear @zimenglan-sysu-512,

Yes, setting data.sampler.train.sample_ratio to [2, 4] or [3, 3] should work well. The first number represents the number of labeled samples, while the second number represents the number of unlabeled samples. Therefore, you will be using 2 or 3 labeled samples per batch per GPU. Make sure

data.samples_per_gpu = sum(data.sampler.train.sample_ratio).

Best,

using 4 GPUs, and batch size is 6 in which sample_ratio = [2, 4], the result is below:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.381
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.542
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.409
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.211
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.484
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.343
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.609
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.740

zimenglan-sysu-512 avatar May 28 '23 12:05 zimenglan-sysu-512

Dear @zimenglan-sysu-512 The results you provided are amazing. It would be very helpful if you could share the config and checkpoint. This experiment could be extremely useful for people with less GPUs. You can start a pull request or send the file to me.

Great 😃

Adamdad avatar May 28 '23 12:05 Adamdad

Dear @zimenglan-sysu-512 The results you provided are amazing. It would be very helpful if you could share the config and checkpoint. This experiment could be extremely useful for people with less GPUs. You can start a pull request or send the file to me.

Great 😃 hi @Adamdad , i will send the config, log and checkpoint files to u. please check the qq email.

zimenglan-sysu-512 avatar May 29 '23 05:05 zimenglan-sysu-512

hi @Adamdad why in the fulldata config, the backbone r50 freeze the BN layer, but the 10% config does not ?

zimenglan-sysu-512 avatar May 30 '23 06:05 zimenglan-sysu-512

dear @zimenglan-sysu-512 could you share your config file with me? It will be of great help to me, I only have 2 GPUs and get a bad training results. my qq mail is [email protected]

cyn-liu avatar Jan 11 '24 03:01 cyn-liu

dear @zimenglan-sysu-512 could you share your config file with me? It will be of great help to me, my qq mail is [email protected]

xiaofu3322 avatar May 22 '24 08:05 xiaofu3322