ENAS-pytorch Reproduce ENAS results on RNN

Hello @carpedm20 ,

Thanks a lot for this nice implementation of the ENAS paper. Did you manage to reproduce their results by retraining the model from scratch?

Thanks, Best

Mar 10 '18 15:03 Wronskia

No. I couldn't reproduce the results of the paper. As far as I experimented, training of ENAS was very unstable with this code and I couldn't figure out the problem yet. Below are what I'm not sure about:

shared or unshared decoder of the controller
moving average baseline
next hidden state (h[-1] or avg(h))
some hyperparameters in config.py (marked with TODO)
loss of REINFORCE is always negative
exploration

Mar 14 '18 07:03 carpedm20

I can comment on 5: the loss of REINFORCE is not always negative. The total loss, however, is almost always negative because the negative entropy of the policy's logits is added to the total loss (in order to maximize policy entropy), and the entropy is always positive. I also have a fix for the entropy calculation in my fork.

Mar 14 '18 14:03 dukebw

@carpedm20 As far as I know, the E[Reward(m, omega)] should be calculated under the meaning of expectation, which means you are supposed to sample several models and average those rewards for each step while training controller. But you just sample one model for calculating Reward() in your code. (I'm not quite sure with this)

As the author said, while training child model, M=1 works fine to estimate E[Loss(m, omega)]. But "we needed at least M=10 to training the policy π". You can find this sentence at https://openreview.net/forum?id=ByQZjx-0-&noteId=BkrqNswgf. It's almost the last sentence.

Mar 20 '18 03:03 Howal

@Howal Thanks for pointing this out. I did think it's weird to update a policy network with only one sample.. this seems important issue which will improve the stability of REINFORCE training.

Mar 20 '18 04:03 carpedm20