RichardsonLiao
RichardsonLiao
Thanks for replying! Here is the pre-trained model, we trained it with mixed precision: ``` [[7269,1],1]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network...
Hi @borisgin Two experiments below are both "Pre-trained model: mixed -> transfer learning configuration: mixed." 1. We set "loss_scaling" as 1000, and here is what we got: ``` [[25146,1],0]: A...
Hi @blisc , thanks for replying. This is pre-trained model: ``` [[23012,1],0]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib)...
> I would like to confirm that you have pulled #333 to your testing branch? Yes, I've tried this version, but still in the same situation. Thanks for your help!
Hi, @blisc After trying all tweaks that you mentioned, here is the result: Pre-trained model: mixed precision, w/ SGD optimizer, and turning "LARC" off. > 1. Can you try using...
Hi, @borisgin Got it, we'll try this configuration. BTW, we have tested this issue on a single GPU machine, without using Horovod, the situation is the same. On the other...
> Thanks for creating an issue. Can I ask: > > * what is the device model? > * are you on insiders dev or beta channel? > * do...
> Thanks for the additional info. Does the AI Dev Gallery window show up at all and what do you see when it does? Hi Nikola, The app window does...