FBNet icon indicating copy to clipboard operation
FBNet copied to clipboard

Multi GPU and ImageNet

Open latifisalar opened this issue 5 years ago • 8 comments

Thanks for the great work! I was wondering if you have done any progress regarding multi-gnu and ImageNet training. Thanks!

latifisalar avatar Jun 05 '19 23:06 latifisalar

It could be done by modifying with model = nn.DataParallel(model) Just by aware that latency_to_accumulate has shape [1] which is not allowed to parallel It should be reshape to [#of GPUs, 1] This bug took me a day. I hope this could help.

chunhanl avatar Jun 12 '19 00:06 chunhanl

Thanks for the tip! I am still trying to get some close to SOA results for CIFAR-10 with FBNets.

latifisalar avatar Jun 12 '19 16:06 latifisalar

@chunhanl Hi, I change the code to support multi-GPU,however I meet the same error:output shape [] doesn‘t match the boradcast shape [1,1],would you share how you reshape latency_to_accumulate

ldd91 avatar Sep 23 '19 06:09 ldd91

@ldd91 Hi, you need to modify the supernetloss function as well: lat = torch.log(torch.mean(latency) ** self.beta) You need to reduce the shape of input latency (by taking average or summing up) to be compatible with the CE loss.

latifisalar avatar Sep 23 '19 14:09 latifisalar

@latifisalar Thank you very much for your help,I will have a try

ldd91 avatar Sep 24 '19 01:09 ldd91

@latifisalar I meet a new issue,with the log shows:AssertionError: Gradients were computed more than backward_passes_per_step times before call to step(). Increase backward_passes_per_step to accumulate gradients locally,

ldd91 avatar Sep 24 '19 02:09 ldd91

@chunhanl @latifisalar ,Hi, Have you test this project in ImageNet?Can this method reach the resoult of the paper,I test this in ImageNet but only get 20% accurency and loss is 5.112,I'm confused about which step I did wrong

ldd91 avatar Nov 04 '19 03:11 ldd91

@chunhanl @latifisalar ,Hi, Have you test this project in ImageNet?Can this method reach the resoult of the paper,I test this in ImageNet but only get 20% accurency and loss is 5.112,I'm confused about which step I did wrong

Hello! How do you run this code on ImageNet? Could you please tell me some more details? Thanks!

weihui98 avatar Jul 22 '20 13:07 weihui98