Masaki Kozuki

Results 42 issues of Masaki Kozuki

This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales...

oncall: distributed
triaged
open source
cla signed

now Chainer supports VGG19, so how about below two? - adding VGG19 to chainercv/links/model/vgg - adding its link to caffemodel to examples/vgg/caffe2npz.py

feature
contributions welcome
feature request

https://github.com/NVIDIA/apex/blob/a0f5f3ac0f6bf39feee6e60eee66ec873dc299ab/apex/transformer/pipeline_parallel/p2p_communication.py#L271 might be able to be removed after confirming https://github.com/pytorch/pytorch/pull/82450

Currently apex.transformer test assumes NCCL backend as you can see in: - https://github.com/NVIDIA/apex/blob/2b7d280ba53898f0b332b7ee02068e4f737d13c9/apex/transformer/testing/distributed_test_base.py#L11 - https://github.com/NVIDIA/apex/blob/2b7d280ba53898f0b332b7ee02068e4f737d13c9/apex/transformer/testing/distributed_test_base.py#L40-L51 By renaming `BACKEND_NCCL` to e.g. `DIST_BACKEND` and replacing `DistributedTestBase.BACKEND_NCCL` with `self.DIST_BACKEND`, the test can be...

Because `pytest` treats it as a test case while it's not. https://github.com/NVIDIA/apex/blob/f9305e7561a967d15157234fa0934c40fa8bbc92/apex/contrib/test/layer_norm/test_fast_layer_norm.py#L128

Pros: `_all_gather_base` has fewer device to device memory copies than `all_gather`. `all_gather` does auxiliary DtoD mem copies in https://github.com/pytorch/pytorch/blob/653892e288b750217dcb7bf4f95ad6c63d3a487d/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp#L1851-L1863. Cons: `_all_gather_base` has been marked as experimental: https://github.com/pytorch/pytorch/blob/653892e288b750217dcb7bf4f95ad6c63d3a487d/torch/distributed/distributed_c10d.py#L2109-L2112. Ref: - `_all_gather_base`...

Ref: https://github.com/pytorch/pytorch/pull/73348

fixes https://github.com/NVIDIA/apex/issues/1238