ProxylessNAS-Pytorch Some confusions

Thank you for your code. I still have some confusions:

Your code is different from DARTS. You first update weight parameters and then update architecture parameters. But DARTS is contrary to you. Could you elaborate on it? When you update architecture parameters, you calculate loss again on train set. This may occupy more GPU memory. I think after you updated weight parameters, you can directly use this loss.

# model weight update
optimizer.zero_grad()
logits = model(input)
loss = criterion(logits, target)

loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), args.grad_clip)
optimizer.step()

# architect parameter alpha update
architect.step(input, target, input_search, target_search, lr, optimizer, unrolled=args.unrolled)``

It is mentioned in the original paper that architecture parameters are resacled in section 3.2. Do you implement this function?
There's one thing I'm not sure about. When update architecture parameters, the author of the origin paper masks architecture parameters and only two paths are selected. When update weight parameters, how many paths are selected? Two paths or all paths? Thank you very much!

Sep 13 '19 04:09 Jihao-Li

@JihaoLee

Darts and ProxylessNAS are two step search, first update the weight parameters and then update the architecture parameters. And i think two step user same GPU memory in update weight first or update architecture parameters first.
Not implement the function, i implement it yet.
Still Confused, i am asking the author of papers, I prefer the latter。

Sep 13 '19 16:09 xieydd

@xieydd Thank you! Could you give me a link to your implementation about ProxylessNAS?

Sep 15 '19 03:09 Jihao-Li

Sorry, Still WIP.

Sep 15 '19 03:09 xieydd

ProxylessNAS-Pytorch ProxylessNAS-Pytorch copied to clipboard

Some confusions

ProxylessNAS-Pytorch
ProxylessNAS-Pytorch copied to clipboard