enas
enas copied to clipboard
Difference between `_fixed_layer` and `_enas_layer` in `cifar10/micro_child.py`
There are a number of differences between _fixed_layer
and _enas_layer
in cifar10/micro_child.py
.
- layer_base variable scope
- strided pooling layers and convolutions
- possible _factorized_reduction for output
Are you able to give some insight on why the code works like this? It seems that when a fixed architecture is specified, the resulting model is not necessarily exactly the same as during the RL training. It seems to me like the easiest way to fix the child architecture is to have an alternate "dummy controller", that just keeps normal_arc
and reduce_arc
fixed at the desired architecture.
Thanks Ben
Hi Ben,
Thanks for the questions. I'll try.
-
The point of
layer_base
, which is just a1x1
convolution, is to standardize the number of output channels toout_filters
before performing the main operation in a convolutional cell or a normal cell. In_enas_layer
, we do this infinal_conv
. The effect is almost the same, but we found it easier to implement this way. -
I don't understand this point of yours. Both
_fixed_layer
and_enas_layer
use both convolutions and pooling. Forfixed_layer
, I hope the code is quite straightforward. For_enas_layer
, since we need to implement a somewhat dynamic graph, we separate the process into the function_enas_cell
. -
The purpose of
_factorized_reduction
is to reduce both spatial dimensions (width and height) by a factor of 2, and potentially to change the number of output filters. Where you mention it, this function is used to make sure that the outputs of all operations in a convolutional cell or a reduction cell will have the same spatial dimensions, so that they can be concatenated along the depth dimension.
The reason why we cannot just fix normal_arc
and reduce_arc
and use the same code for both the search process and fixed-architecture process is efficiency. Dynamic graphs in TF, at least the way we implement them, are slow and very memory inefficient.
Let us know if you still have more questions 😃
For number 2, the point was that you're using pooling w/ stride > 1 in the fixed architecture, but a combination of _factorized_reduction
and pooling w/ stride = 1 in the ENAS cells.
Makes sense about the dynamic graphs being slow.
Thanks for the quick response. (And thanks for releasing the code! I've been working on a similar project for a little while, so am very excited to compare what I've done to your code.)
~ Ben
For number 2, the point was that you're using pooling w/ stride > 1 in the fixed architecture, but a combination of
_factorized_reduction
and pooling w/ stride = 1 in the ENAS cells.
I think it's just because we couldn't figure out how to syntactically make _factorized_reduction
run with the output of a dynamic operation, such as tf.case
.
@hyhieu I am wondering if the reduction cell in _fixed_layer
and _enas_layer
have the same previous layers
result of _factorized_reduction
is appended to the layers
If I understand it correctly, to make the previous layers consistent, this line should be
layers = [layers[0], x]