Masaki Kozuki comments

Results 167 comments of


                                            Masaki Kozuki

ModuleNotFoundError: No module named 'fused_layer_norm_cuda', ubuntu 22.04, Successfully installed apex-0.1

Could you try installing apex with `--global-option="--cuda_ext"` option as well? `fused_layer_norm_cuda` would not be installed with "--fast_layer_norm" option but "--cuda_ext".

RuntimeError: Error compiling objects for extension error: subprocess-exited-with-error

just so you know Windows support is experimental as noted in https://github.com/NVIDIA/apex#experimental-windows

optimize mask generator for n:m sparsity and fix a bug

cc: @ChongyuNVIDIA @jpool-nv

Fix bug with grad clipping and distributed Adam

> Note that [`torch.cuda.amp.GradScaler.unscale_`](https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.unscale_) does not natively support the distributed optimizer, so I have to unscale by directly manipulating `DistributedFusedAdam._grad_scale`. qq: would https://github.com/NVIDIA/apex/blob/master/apex/transformer/amp/grad_scaler.py be useful?

Fix bug with grad clipping and distributed Adam

apex's grad scaler there was just ported (copied) from NeMo per Sangkug's suggestion. also NeMo's looks based off of PyTorch's one: https://github.com/NVIDIA/NeMo/blob/18940b3b32cff290cf70d4a251b0e2f7b08e1525/nemo/collections/nlp/parts/nlp_overrides.py#L395 optimistically speaking the implementation wouldn't have been updated...

BUILD: Change `prefix` to a relative path with `CMAKE_CURRENT_LIST_DIR`

yes, is there anything I need to do?

[mta] APEX style Fused Adam

@pytorchmergebot rebase

[mta] APEX style Fused Adam

With https://github.com/pytorch/pytorch/pull/84314 and separate files for with and without amsgrad, CI looks green

Where is the vgg16 pretrained modelS with BN?

Hi, I really appreciate sharing the trained weights. Could you share VGG16 with BN?

Optuna Suggests the Same Parameter Values in a lot of Trials (Duplicate Trials that Waste Time and Budget)

Desirably, if the suggested parameters are the ones already suggested, the trial should be skipped for faster hyperparameter optimization, right? Reproduced output: https://gist.github.com/crcrpar/aba308a1350bb4986276a6c87cf256cb