Masaki Kozuki
Masaki Kozuki
Could you try installing apex with `--global-option="--cuda_ext"` option as well? `fused_layer_norm_cuda` would not be installed with "--fast_layer_norm" option but "--cuda_ext".
just so you know Windows support is experimental as noted in https://github.com/NVIDIA/apex#experimental-windows
cc: @ChongyuNVIDIA @jpool-nv
> Note that [`torch.cuda.amp.GradScaler.unscale_`](https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.unscale_) does not natively support the distributed optimizer, so I have to unscale by directly manipulating `DistributedFusedAdam._grad_scale`. qq: would https://github.com/NVIDIA/apex/blob/master/apex/transformer/amp/grad_scaler.py be useful?
apex's grad scaler there was just ported (copied) from NeMo per Sangkug's suggestion. also NeMo's looks based off of PyTorch's one: https://github.com/NVIDIA/NeMo/blob/18940b3b32cff290cf70d4a251b0e2f7b08e1525/nemo/collections/nlp/parts/nlp_overrides.py#L395 optimistically speaking the implementation wouldn't have been updated...
yes, is there anything I need to do?
@pytorchmergebot rebase
With https://github.com/pytorch/pytorch/pull/84314 and separate files for with and without amsgrad, CI looks green
Hi, I really appreciate sharing the trained weights. Could you share VGG16 with BN?
Desirably, if the suggested parameters are the ones already suggested, the trial should be skipped for faster hyperparameter optimization, right? Reproduced output: https://gist.github.com/crcrpar/aba308a1350bb4986276a6c87cf256cb