ConsistentTeacher icon indicating copy to clipboard operation
ConsistentTeacher copied to clipboard

training method

Open Re-dot-art opened this issue 11 months ago • 12 comments

Hello, the result of running the code directly differs significantly from the result displayed in your paper. I guess it's due to a problem with the training method, so I would like to confirm with you: before using semi-supervised methods, do we need to train the Faster RCNN network with 1% or 5% label samples, and then use semi-supervised learning methods to train the pre-trained weights again after the training is completed? Looking forward to your reply!

Re-dot-art avatar Mar 18 '24 03:03 Re-dot-art

@Re-dot-art No need to do separate training. ConsistentTeacher enables end-to-end training, which means that the labeled data and unlabeled data are fed to the model at the same time. The teacher is maintained as a moving average of the student. For your problem regarding the performance, can you specify which config you are using? What's your batch size and GPU number?

Adamdad avatar Mar 18 '24 03:03 Adamdad

@Re-dot-art No need to do separate training. ConsistentTeacher enables end-to-end training, which means that the labeled data and unlabeled data are fed to the model at the same time. The teacher is maintained as a moving average of the student. For your problem regarding the performance, can you specify which config you are using? What's your batch size and GPU number?

image image I ran the above two experimental settings on two v100 images, samples_per_gpu=5

Re-dot-art avatar Mar 18 '24 04:03 Re-dot-art

@Re-dot-art No need to do separate training. ConsistentTeacher enables end-to-end training, which means that the labeled data and unlabeled data are fed to the model at the same time. The teacher is maintained as a moving average of the student. For your problem regarding the performance, can you specify which config you are using? What's your batch size and GPU number? image

Re-dot-art avatar Mar 18 '24 04:03 Re-dot-art

As mentioned in README, all experiments in the paper use 8gpux5sample-per-gpu for training. Smaller bs gets worse results as expected. But your results seems to be too low, which even worse than the baseline. Did you edit anything?

Adamdad avatar Mar 18 '24 04:03 Adamdad

As mentioned in README, all experiments in the paper use 8gpux5sample-per-gpu for training. Smaller bs gets worse results as expected. But your results seems to be too low, which even worse than the baseline. Did you edit anything?

I did not make any modifications to the code, only added some comments.

Re-dot-art avatar Mar 18 '24 04:03 Re-dot-art

Could you please share your configuration settings, the scripts you're using for execution, and the method you're employing to process the dataset? This results is even lower than baselines that just train on labeled data only (No use of unlabeled data). I suspect there is something wrong on your side. I'm here to assist, but I'll need more detailed information to provide effective support.

Adamdad avatar Mar 18 '24 05:03 Adamdad

Could you please share your configuration settings, the scripts you're using for execution, and the method you're employing to process the dataset? This results is even lower than baselines that just train on labeled data only (No use of unlabeled data). I suspect there is something wrong on your side. I'm here to assist, but I'll need more detailed information to provide effective support.

Okay, thank you. The config file for the experiment is as follows: config.zip

The processing of the dataset is carried out according to the methods in readme, and the processing results are shown in the following figure: image image Thank you.

Re-dot-art avatar Mar 18 '24 06:03 Re-dot-art

Do you use wandb to record the training process? If yes, can you also share?

Adamdad avatar Mar 18 '24 06:03 Adamdad

Do you use wandb to record the training process? If yes, can you also share?

Untitled Report _ consistent-teacher – Weights & Biases.pdf

Re-dot-art avatar Mar 18 '24 06:03 Re-dot-art

Do you use wandb to record the training process? If yes, can you also share? The blue line represents the completed **10p experimental setup. The red color is the experimental setup of **1p being trained.

Re-dot-art avatar Mar 18 '24 06:03 Re-dot-art

I would suggest follow this config to (1) increase batch size (2) increase the number of labeled sample within a batch (3) and lower your learning rate https://github.com/Adamdad/ConsistentTeacher/blob/main/configs/consistent-teacher/consistent_teacher_r50_fpn_coco_180k_10p_2x8.py

For 2 GPUs, the original config only has 2 labeled sample in total. As such 0.01 is too large for the network to converge. So for 2-GPU-training, we use 4labeled:4unlabeled per GPU, with in total 8 labeled sample and a learning rate of 0.005. This ensure, at least, the model need to converge on the labeled dataset in the first place.

Adamdad avatar Mar 18 '24 07:03 Adamdad

I would suggest follow this config to (1) increase batch size (2) increase the number of labeled sample within a batch (3) and lower your learning rate https://github.com/Adamdad/ConsistentTeacher/blob/main/configs/consistent-teacher/consistent_teacher_r50_fpn_coco_180k_10p_2x8.py

For 2 GPUs, the original config only has 2 labeled sample in total. As such 0.01 is too large for the network to converge. So for 2-GPU-training, we use 4labeled:4unlabeled per GPU, with in total 8 labeled sample and a learning rate of 0.005. This ensure, at least, the model need to converge on the labeled dataset in the first place.

Okay, thank you very much! I'll give it a try.

Re-dot-art avatar Mar 18 '24 07:03 Re-dot-art