audioContextEncoder
audioContextEncoder copied to clipboard

Published 20 hours ago •

Reame
Issues

Use dropout correctly

Open andimarafioti opened this issue 6 years ago • 9 comments

I added a dropout feature to the sequential model. Preliminary tests on it are a bit hard to asses.

I trained two equivalents networks for 800k steps with a learning rate of 1e-3. In orange there's a network with dropout = 0.3 for the linear layer and 0.1 for all conv and deconv layers except the last deconv. In blue is the same network without any dropout. I think the sudden change in the orange one in the training SNR comes when I restarted the training with dropout = 0.3 for the linear layer (before it was 0.5, I'm not really sure)

It seems to work well since the performance on the validation test is better with dropout and worse on the training set.

What do you think? Should I run more tests? Are this parameters good for you? (30% on the linear layer and 10% on convs)

I also tried the same net w/only dropout=50% on convs (blue):

Mar 07 '18 11:03 andimarafioti

I will also change the implementation of the dropout to be a little more explicit and descriptive.

Mar 07 '18 11:03 andimarafioti

I changed it here: https://github.com/andimarafioti/audioContextEncoder/commit/a8208b776af7dd95a18fa77333bf8a14b9d5113f

Mar 07 '18 11:03 andimarafioti

According to the original paper, dropout should be after the activation (relu).

Mar 07 '18 11:03 andimarafioti

And in here F. Chollet says the RELU should go before batch normalization.

Mar 07 '18 11:03 andimarafioti

When I change the learning rate the SNR on validation improves drastically for the network that doesn't have dropout, making it work way better than the one with dropout and also similarly to what it does for training:

I don't know why this effect happens with the learning rate, but it's been happening for a while now. The weirdness is: on blue I added dropout and it did worse. On orange I removed dropout and it did better. Of course dropout is (should) being removed at testing/validating.

Maybe this small network is not able to overfit the training set?

Mar 07 '18 14:03 andimarafioti

I may be seeing a problem that arises from not having a separate graph for the training and evaluating.

Mar 07 '18 14:03 andimarafioti

I did some tests setting the dropout to really high values and the performance on the testing set is really affected but not on the validation, so its probably not a matter of having separated graphs

Mar 07 '18 15:03 andimarafioti

According to the plot, it seems to work. To be discussed in the next meeting.

Mar 07 '18 16:03 nperraud

Please, use 20% dropout before the fully connected layer only.

Mar 08 '18 13:03 nperraud