TF-NAS
TF-NAS copied to clipboard
Why not do log_softmax("arch_param") in the graph?
In train_search.py, I noticed that you do log_softmax() out of the graph, but why? Why not just use param alpha instead and do log_softmax() in each forward step?
Hi, thanks for your attention to our repo.
Originally, for convenience, we define the variable "log_alphas" as the log probability distribution for operations. After each architecture optimization step, this defination is violated. Following ProxylessNAS(Sec. 3.2.1) and DenseNAS (A.3.), we do log_softmax() out of the graph to rescale the updated values.
I think it's ok to just use param alpha and do log_softmax() in each forward step. I will run experiment for this. Thanks a lot.
Please keep me updated on your progress.
iw_key = 'module.{}.{}.m_ops.{}.inverted_bottleneck.conv.weight'.format(stage, block, op_idx)
state_dict[iw_key].data[index,:,:,:] = state_dict_from_model[iw_key]
dw_key = 'module.{}.{}.m_ops.{}.depth_conv.conv.weight'.format(stage, block, op_idx)
state_dict[dw_key].data[index,:,:,:] = state_dict_from_model[dw_key]
pw_key = 'module.{}.{}.m_ops.{}.point_linear.conv.weight'.format(stage, block, op_idx)
state_dict[pw_key].data[:,index,:,:] = state_dict_from_model[pw_key]
Can you explain these lines? Why pw_key's index is in the second dimension?
@touchdreamer The width search only occurs on depth_conv
. The output of depth_conv
is the input to point_linear
, and the shape of convolutional weights in pytorch is (C_out, C_in/groups, k_h, k_w). Thus, the index of dw_key is in the first dimension but the pw_key's index is in the second one.