mean-teacher Losses

Hi, thank you for your great project!

I’m stuck with two problems while trying to test the mean teacher idea as described in your NIPS 2017 presentation with a MNIST dataset and a simple convnet from official Pytorch examples using your Pytorch code:

Loss is defined as:

loss = class_loss + consistency_loss + res_loss

where

if args.consistency:
    (...)
    consistency_loss = consistency_weight * consistency_criterion(cons_logit, ema_logit) / minibatch_size
    (…)
else:
    consistency_loss = 0

but default value of args.consistency is None, so consistency_loss=0 by default

Similarly,

if args.logit_distance_cost >= 0:
    (…)
else:
    (…)
    res_loss = 0

but args.logit_distance_cost=-1 by default

So using the default values switches the mean teacher off and just an ordinary supervised model remains? Should these losses be complimentary or interchanged?

Training a mean teacher model on MNIST with some consistency weight without res_loss with fixed hyperparameters (https://github.com/rracinskij/mean_teacher/blob/master/mean_teacher.py) gives significantly lower test accuracy (~78% with 1000 labels) compared to setting the consistency weight to zero (~92%).

I’d greatly appreciate any comments or hints.

Nov 21 '18 21:11 rracinskij

I think to make mean teacher work, you have to set the consistency_weight to some value. In the mean teacher pytorch webpage, it is set to 100.0. The logit_distance_cost is set to 0.01 for Cifar10 experiment. I believe these are necessary to make mean teacher work.

Nov 23 '18 21:11 SHMCU

It looks like that the logit_distance_cost should be set to some positive value only if the student model has two outputs. And yes, total loss depends on the teacher model only if the consistency_weight is non-zero. But then the accuracy of my minimalistic MNIST implementation is lower compared to a single convnet.

Nov 23 '18 21:11 rracinskij

Hi,

So if I understood correctly, your dataset is MNIST with 1000 labeled and 59000 unlabeled examples? And you are using a convolutional network with mean teacher and comparing the results against a bare convolutional network?

Yes, you should set consistency > 0. The best value for consistency may depend on the dataset, the mix of unlabeled/labeled per batch, and other things. A bad consistency cost can lead to worse performance than not using any. Also ema_decay parameter may effect performance a lot. See Figure 4 in the paper for what these look like for SVHN.

At the beginning of the training, the labeled examples are much more useful than the unlabeled examples. If you have a high consistency cost in the beginning, it may hurt the learning. There are two ways around it: either use a consistency ramp-up or use logit_distance_cost > 0 (and yes, two outputs from the network). Also these are hyperparameters that may require tuning.

See also https://github.com/CuriousAI/mean-teacher#tips-for-choosing-hyperparameters-and-other-tuning if you didn't already.

Nov 25 '18 16:11 tarvaina

Hi there, I notice this problem, too. As we known that paper mentioned just 2 kind of loss(class loss and consistency loss) to optimize, what's the situation that student model has 2 output? I saw the difference between the output is that using different fc layer. Is it because representation learning or some stuff?

Thanks a lot!

Aug 02 '22 12:08 DISAPPEARED13

mean-teacher mean-teacher copied to clipboard

Losses

mean-teacher
mean-teacher copied to clipboard