tf-dann
tf-dann copied to clipboard
having trouble with convergence
Hi, first off thank you for the wonderful code. I am trying to replicate the toy blob example in pytorch. I am finding that it unreliably converges to the same accuracies that you report. Sometimes it will not converge at all, and other times it will get to the 97% source/97% target accuracy. Also, the source-only training yields a 50% accuracy on target domain. I was wondering if there were any snags you encountered that hindered convergence?
Thanks
Austin
I have the same question.
Does lowering your learning rate help?
Actually, the blobs example in general is fairly unreliable - I can get poor results occasionally after repeated runs. Honestly, I didn't do any tuning of hyperparams - it was just a small, fast experiment to validate the implementation when I was writing it. If you find hyperparams that work better, please share them and I can update the example.
for me, the biggest discrepancy with the blobs example was that source-only training resulted in trivial (50%) accuracy. did you get this as well? as for hyperparameters, adding more dimensions to the feature extractor (i.e. going from 8 to 50) was what allowed it to converge at all for me.
I take it that that 50% was the source accuracy, not the target accuracy? In that case, there is certainly something wrong, but 50% accuracy on the target class is not unusual if you only train on the source.
One thing that might help is annealing the gradient reversal parameter. I do this in the MNIST example, following the schedule presented in the paper, but for the blobs example I keep it fixed at -1 throughout training. That is almost certainly not the optimal thing to do.