TF-NAS Why not do log_softmax("arch_param") in the graph？

Why not do log_softmax("arch_param") in the graph？

Open jxgu1016 opened this issue 4 years ago • 4 comments

In train_search.py, I noticed that you do log_softmax() out of the graph, but why? Why not just use param alpha instead and do log_softmax() in each forward step?

Aug 17 '20 09:08 jxgu1016

Hi, thanks for your attention to our repo.

Originally, for convenience, we define the variable "log_alphas" as the log probability distribution for operations. After each architecture optimization step, this defination is violated. Following ProxylessNAS(Sec. 3.2.1) and DenseNAS (A.3.), we do log_softmax() out of the graph to rescale the updated values.

I think it's ok to just use param alpha and do log_softmax() in each forward step. I will run experiment for this. Thanks a lot.

Aug 17 '20 10:08 AberHu

Please keep me updated on your progress.

Aug 19 '20 01:08 jxgu1016

				iw_key = 'module.{}.{}.m_ops.{}.inverted_bottleneck.conv.weight'.format(stage, block, op_idx)
				state_dict[iw_key].data[index,:,:,:] = state_dict_from_model[iw_key]
				dw_key = 'module.{}.{}.m_ops.{}.depth_conv.conv.weight'.format(stage, block, op_idx)
				state_dict[dw_key].data[index,:,:,:] = state_dict_from_model[dw_key]
				pw_key = 'module.{}.{}.m_ops.{}.point_linear.conv.weight'.format(stage, block, op_idx)
				state_dict[pw_key].data[:,index,:,:] = state_dict_from_model[pw_key]

Can you explain these lines? Why pw_key's index is in the second dimension?

Oct 07 '20 02:10 touchdreamer

@touchdreamer The width search only occurs on depth_conv. The output of depth_conv is the input to point_linear, and the shape of convolutional weights in pytorch is (C_out, C_in/groups, k_h, k_w). Thus, the index of dw_key is in the first dimension but the pw_key's index is in the second one.

Oct 10 '20 13:10 AberHu

TF-NAS TF-NAS copied to clipboard

Why not do log_softmax("arch_param") in the graph？

TF-NAS
TF-NAS copied to clipboard