ProxylessNAS-Pytorch
ProxylessNAS-Pytorch copied to clipboard
Some confusions
Thank you for your code. I still have some confusions:
- Your code is different from DARTS. You first update weight parameters and then update architecture parameters. But DARTS is contrary to you. Could you elaborate on it? When you update architecture parameters, you calculate loss again on train set. This may occupy more GPU memory. I think after you updated weight parameters, you can directly use this loss.
# model weight update
optimizer.zero_grad()
logits = model(input)
loss = criterion(logits, target)
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), args.grad_clip)
optimizer.step()
# architect parameter alpha update
architect.step(input, target, input_search, target_search, lr, optimizer, unrolled=args.unrolled)``
- It is mentioned in the original paper that architecture parameters are resacled in section 3.2. Do you implement this function?
- There's one thing I'm not sure about. When update architecture parameters, the author of the origin paper masks architecture parameters and only two paths are selected. When update weight parameters, how many paths are selected? Two paths or all paths? Thank you very much!
@JihaoLee
- Darts and ProxylessNAS are two step search, first update the weight parameters and then update the architecture parameters. And i think two step user same GPU memory in
update weight first
orupdate architecture parameters first
. - Not implement the function, i implement it yet.
- Still Confused, i am asking the author of papers, I prefer the latter。
@xieydd Thank you! Could you give me a link to your implementation about ProxylessNAS?
Sorry, Still WIP.