demon About v2 training

Hi @benjaminum, have you ever successfully trained the network with your latest v2 training code? I've been experiment with it for a few days, finding it's extremely hard to converge. This is my training loss status: (please ignore TotalLoss_1) This is trained within about 500k iterations with default start learning rate and batch size on the first evolution. I also tried other lr and optimizer configurations, the result is similar.

If you do have successfully trained the network, would you like to share the hyperparameters you used?
I noticed that the implementation of the network is changed a little bit in blocks.py, could that be a reason? Why would you make this change and what is v2 means exactly..
What are the remaining tasks for training code, would you like to share your progress?

Thanks for this amazing work!

Sep 20 '17 10:09 JiamingSuen

Hi @JiamingSuen ,

thanks for checking out our training code!

At the moment we use the hyperparameters as in the training code. There is probably a lot of room for improving these parameters. The losses will eventually converge if you train for a very long time, but it does not improve the testing performance.
v2 is an attempt to create a version of our network that can be trained easily with tensorflow. It is meant as a basis for future experiments to improve the architecture. First steps towards a better architecture are already in blocks.py. We share it because we hope it will be useful to other researchers.
As you probably have noticed, the training procedure is quite complex and the training losses can be difficult to understand on first glance. One important remaining task is to provide easy to use evaluation code to better assess the network performance.

Thanks for this amazing work!

Thank you!

Sep 20 '17 14:09 benjaminum

Thanks for the reply. I tried to initialize weights with tf.contrib.layers.variance_scaling_initializer(factor=2.0), which is the "MSRA-initialization" described in this paper, while it's not helping a lot.

What initialization did you use in the original Caffe implementation?
Is it because the input data is quite noisy? I'm thinking about adding batch normalization layer, do you think it's a good idea? Or just start the training with synthetic dataset..

Will keep update my progress here.

Sep 21 '17 08:09 JiamingSuen

Asking myself the same thing ... thought Totalloss should go down after a while. But it does not really look good (+160k iterations) : https://tensorboard.dev/experiment/aay2ZG8aRUaZM1EwML3jPA/#scalars&run=0_flow1%2Ftrainlogs&_smoothingWeight=0.989

Edit: Guess i would need a total loss that does not include the *_sig (and instead include the *_sig_unscaled losses) to have a nice looking graph. Atleast i now understand why total loss does not decrease much while training itself actually does improve.

Sep 29 '20 17:09 TheCrazyT

demon demon copied to clipboard

About v2 training

demon
demon copied to clipboard