ENAS-pytorch icon indicating copy to clipboard operation
ENAS-pytorch copied to clipboard

Reproduce ENAS results on RNN

Open Wronskia opened this issue 7 years ago • 4 comments

Hello @carpedm20 ,

Thanks a lot for this nice implementation of the ENAS paper. Did you manage to reproduce their results by retraining the model from scratch?

Thanks, Best

Wronskia avatar Mar 10 '18 15:03 Wronskia

No. I couldn't reproduce the results of the paper. As far as I experimented, training of ENAS was very unstable with this code and I couldn't figure out the problem yet. Below are what I'm not sure about:

  1. shared or unshared decoder of the controller
  2. moving average baseline
  3. next hidden state (h[-1] or avg(h))
  4. some hyperparameters in config.py (marked with TODO)
  5. loss of REINFORCE is always negative
  6. exploration

carpedm20 avatar Mar 14 '18 07:03 carpedm20

I can comment on 5: the loss of REINFORCE is not always negative. The total loss, however, is almost always negative because the negative entropy of the policy's logits is added to the total loss (in order to maximize policy entropy), and the entropy is always positive. I also have a fix for the entropy calculation in my fork.

dukebw avatar Mar 14 '18 14:03 dukebw

@carpedm20 As far as I know, the E[Reward(m, omega)] should be calculated under the meaning of expectation, which means you are supposed to sample several models and average those rewards for each step while training controller. But you just sample one model for calculating Reward() in your code. (I'm not quite sure with this)

As the author said, while training child model, M=1 works fine to estimate E[Loss(m, omega)]. But "we needed at least M=10 to training the policy π". You can find this sentence at https://openreview.net/forum?id=ByQZjx-0-&noteId=BkrqNswgf. It's almost the last sentence.

Howal avatar Mar 20 '18 03:03 Howal

@Howal Thanks for pointing this out. I did think it's weird to update a policy network with only one sample.. this seems important issue which will improve the stability of REINFORCE training.

carpedm20 avatar Mar 20 '18 04:03 carpedm20