enas icon indicating copy to clipboard operation
enas copied to clipboard

Difference between `_fixed_layer` and `_enas_layer` in `cifar10/micro_child.py`

Open bkj opened this issue 6 years ago • 4 comments

There are a number of differences between _fixed_layer and _enas_layer in cifar10/micro_child.py.

  1. layer_base variable scope
  2. strided pooling layers and convolutions
  3. possible _factorized_reduction for output

Are you able to give some insight on why the code works like this? It seems that when a fixed architecture is specified, the resulting model is not necessarily exactly the same as during the RL training. It seems to me like the easiest way to fix the child architecture is to have an alternate "dummy controller", that just keeps normal_arc and reduce_arc fixed at the desired architecture.

Thanks Ben

bkj avatar Apr 03 '18 20:04 bkj

Hi Ben,

Thanks for the questions. I'll try.

  1. The point of layer_base, which is just a 1x1 convolution, is to standardize the number of output channels to out_filters before performing the main operation in a convolutional cell or a normal cell. In _enas_layer, we do this in final_conv. The effect is almost the same, but we found it easier to implement this way.

  2. I don't understand this point of yours. Both _fixed_layer and _enas_layer use both convolutions and pooling. For fixed_layer, I hope the code is quite straightforward. For _enas_layer, since we need to implement a somewhat dynamic graph, we separate the process into the function _enas_cell.

  3. The purpose of _factorized_reduction is to reduce both spatial dimensions (width and height) by a factor of 2, and potentially to change the number of output filters. Where you mention it, this function is used to make sure that the outputs of all operations in a convolutional cell or a reduction cell will have the same spatial dimensions, so that they can be concatenated along the depth dimension.

The reason why we cannot just fix normal_arc and reduce_arc and use the same code for both the search process and fixed-architecture process is efficiency. Dynamic graphs in TF, at least the way we implement them, are slow and very memory inefficient.

Let us know if you still have more questions 😃

hyhieu avatar Apr 03 '18 21:04 hyhieu

For number 2, the point was that you're using pooling w/ stride > 1 in the fixed architecture, but a combination of _factorized_reduction and pooling w/ stride = 1 in the ENAS cells.

Makes sense about the dynamic graphs being slow.

Thanks for the quick response. (And thanks for releasing the code! I've been working on a similar project for a little while, so am very excited to compare what I've done to your code.)

~ Ben

bkj avatar Apr 03 '18 21:04 bkj

For number 2, the point was that you're using pooling w/ stride > 1 in the fixed architecture, but a combination of _factorized_reduction and pooling w/ stride = 1 in the ENAS cells.

I think it's just because we couldn't figure out how to syntactically make _factorized_reduction run with the output of a dynamic operation, such as tf.case.

hyhieu avatar Apr 03 '18 22:04 hyhieu

@hyhieu I am wondering if the reduction cell in _fixed_layer and _enas_layer have the same previous layers result of _factorized_reduction is appended to the layers

If I understand it correctly, to make the previous layers consistent, this line should be

layers = [layers[0], x]

stanstarks avatar May 18 '18 16:05 stanstarks