once-for-all copied to clipboard
How many subnets does knowledge distillation optimize?
I have a question that is not cleared in the paper. During knowledge distillation, do you optimize for all 10^19 networks? The elastic - nn portion of the code seems to point to that:
subnet_settings = []
for d in depth_list:
for e in expand_ratio_list:
for k in ks_list:
for w in width_mult_list:
for img_size in image_size_list:
'image_size': img_size,
'd': d,
'e': e,
'ks': k,
'w': w,
}, 'R%s-D%s-E%s-K%s-W%s' % (img_size, d, e, k, w)])
Hi @swapnilsayansaha,
As per my understanding, this code is just generating the different Subnet Settings, I do not see any Implementation regarding KD in the Code base.
@Darshcg how many subnets does it consider? all 10^19?
@swapnilsayansaha I am not exactly sure, but in this, it is creating 8 subnet settings in this code for MobilenetV3.