Masaki Kozuki issues

Results 42 issues of


Masaki Kozuki

[mta] APEX style Fused Adam

This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales...

oncall: distributed

triaged

open source

cla signed

request: VGG19

now Chainer supports VGG19, so how about below two? - adding VGG19 to chainercv/links/model/vgg - adding its link to caffemodel to examples/vgg/caffe2npz.py

feature

contributions welcome

feature request

potential improvement in p2p communication

https://github.com/NVIDIA/apex/blob/a0f5f3ac0f6bf39feee6e60eee66ec873dc299ab/apex/transformer/pipeline_parallel/p2p_communication.py#L271 might be able to be removed after confirming https://github.com/pytorch/pytorch/pull/82450

remove `_` from `_reconfigure_microbatch_calculator`

Signed-off-by: Masaki Kozuki

[do not review] Change C++/CUDA Custom Extensions' Path

[rfc][transformer][test] Make test flexible

Currently apex.transformer test assumes NCCL backend as you can see in: - https://github.com/NVIDIA/apex/blob/2b7d280ba53898f0b332b7ee02068e4f737d13c9/apex/transformer/testing/distributed_test_base.py#L11 - https://github.com/NVIDIA/apex/blob/2b7d280ba53898f0b332b7ee02068e4f737d13c9/apex/transformer/testing/distributed_test_base.py#L40-L51 By renaming `BACKEND_NCCL` to e.g. `DIST_BACKEND` and replacing `DistributedTestBase.BACKEND_NCCL` with `self.DIST_BACKEND`, the test can be...

Rename `test_` function of apex/contrib/test/layer_norm/test_fast_layer_norm.py

Because `pytest` treats it as a test case while it's not. https://github.com/NVIDIA/apex/blob/f9305e7561a967d15157234fa0934c40fa8bbc92/apex/contrib/test/layer_norm/test_fast_layer_norm.py#L128

[transformer] Use `torch.distributed._all_gather_base`

Pros: `_all_gather_base` has fewer device to device memory copies than `all_gather`. `all_gather` does auxiliary DtoD mem copies in https://github.com/pytorch/pytorch/blob/653892e288b750217dcb7bf4f95ad6c63d3a487d/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp#L1851-L1863. Cons: `_all_gather_base` has been marked as experimental: https://github.com/pytorch/pytorch/blob/653892e288b750217dcb7bf4f95ad6c63d3a487d/torch/distributed/distributed_c10d.py#L2109-L2112. Ref: - `_all_gather_base`...

Replace `torch.testing.assert_allclose` with `torch.testing.assert_close`.

Ref: https://github.com/pytorch/pytorch/pull/73348

check if `get_autocast_gpu_dtype` is available

fixes https://github.com/NVIDIA/apex/issues/1238