FBNet
FBNet copied to clipboard
Multi GPU and ImageNet
Thanks for the great work! I was wondering if you have done any progress regarding multi-gnu and ImageNet training. Thanks!
It could be done by modifying with model = nn.DataParallel(model) Just by aware that latency_to_accumulate has shape [1] which is not allowed to parallel It should be reshape to [#of GPUs, 1] This bug took me a day. I hope this could help.
Thanks for the tip! I am still trying to get some close to SOA results for CIFAR-10 with FBNets.
@chunhanl Hi, I change the code to support multi-GPU,however I meet the same error:output shape [] doesn‘t match the boradcast shape [1,1],would you share how you reshape latency_to_accumulate
@ldd91 Hi, you need to modify the supernetloss function as well:
lat = torch.log(torch.mean(latency) ** self.beta)
You need to reduce the shape of input latency (by taking average or summing up) to be compatible with the CE loss.
@latifisalar Thank you very much for your help,I will have a try
@latifisalar I meet a new issue,with the log shows:AssertionError: Gradients were computed more than backward_passes_per_step times before call to step(). Increase backward_passes_per_step to accumulate gradients locally,
@chunhanl @latifisalar ,Hi, Have you test this project in ImageNet?Can this method reach the resoult of the paper,I test this in ImageNet but only get 20% accurency and loss is 5.112,I'm confused about which step I did wrong
@chunhanl @latifisalar ,Hi, Have you test this project in ImageNet?Can this method reach the resoult of the paper,I test this in ImageNet but only get 20% accurency and loss is 5.112,I'm confused about which step I did wrong
Hello! How do you run this code on ImageNet? Could you please tell me some more details? Thanks!