enas icon indicating copy to clipboard operation
enas copied to clipboard

Question on the Deriving Architectures section of section 2.2 of the paper

Open philtomson opened this issue 6 years ago • 1 comments

It says: "We first sample several models from the trained policy π(m, θ). For each sampled model, we compute its reward on a single minibatch sampled from the validation set. We then take only the model with the highest reward to re-train from scratch. "

Is this something that happens at the end of every epoch or at the end of the final epoch? (do you wait until the final epoch to pick the sampled model with the highest reward and then re-train it from scratch?)

philtomson avatar Oct 06 '18 00:10 philtomson

I'm not sure if this answers your question, but it sounds to me like they train the controller first, and then sample a few architectures from the trained controller to find the best architecture. They seem to discard all the architectures produced during training, so I believe that means they're waiting until after the final epoch (in terms of controller training) to pick the architecture. To get an architecture after training, they sample and then run a single minibatch thru all the sampled ones, and only fully train the best one.

Training:

The training procedure of ENAS consists of two interleaving phases. The first phase trains ω, the shared parameters of the child models, on a whole pass through the training data set ... The second phase trains θ, the parameters of the controller LSTM, for a fixed number of steps, typically set to 2000 in our experiments. These two phases are alternated during the training of ENAS.

After training:

We discuss how to derive novel architectures from a trained ENAS model. We first sample several models from the trained policy π(m, θ). For each sampled model, we compute its reward on a single minibatch sampled from the validation set. We then take only the model with the highest reward to re-train from scratch.

e-271 avatar Jan 10 '19 17:01 e-271