once-for-all How many subnets does knowledge distillation optimize?

How many subnets does knowledge distillation optimize?

Open swapnilsayansaha opened this issue 3 years ago • 3 comments

I have a question that is not cleared in the paper. During knowledge distillation, do you optimize for all 10^19 networks? The elastic - nn portion of the code seems to point to that:

	subnet_settings = []
	for d in depth_list:
		for e in expand_ratio_list:
			for k in ks_list:
				for w in width_mult_list:
					for img_size in image_size_list:
						subnet_settings.append([{
							'image_size': img_size,
							'd': d,
							'e': e,
							'ks': k,
							'w': w,
						}, 'R%s-D%s-E%s-K%s-W%s' % (img_size, d, e, k, w)])

May 14 '21 22:05 swapnilsayansaha

Hi @swapnilsayansaha,

As per my understanding, this code is just generating the different Subnet Settings, I do not see any Implementation regarding KD in the Code base.

May 17 '21 06:05 Darshcg

@Darshcg how many subnets does it consider? all 10^19?

May 17 '21 07:05 swapnilsayansaha

@swapnilsayansaha I am not exactly sure, but in this, it is creating 8 subnet settings in this code for MobilenetV3.

May 17 '21 09:05 Darshcg

once-for-all once-for-all copied to clipboard

How many subnets does knowledge distillation optimize?

once-for-all
once-for-all copied to clipboard