Masaki Kozuki comments

Results 167 comments of


                                            Masaki Kozuki

Replace `torch.testing.assert_allclose` with `torch.testing.assert_close`.

- ./tests/L0/run_transformer/test_random.py - ./tests/L0/run_fused_layer_norm/test_fused_layer_norm.py

Assertion error fired

unfortunately pyprof in apex doesn't seem maintained well and according to https://github.com/NVIDIA/PyProf#pyprof---pytorch-profiling-tool, I guess it's better to use [dlprof](https://docs.nvidia.com/deeplearning/frameworks/dlprof-user-guide/)

potential improvement in p2p communication

in apex, the version check against NGC PyTorch containers (e.g. 22.08) and the released PyTorch (e.g. 1.12.1) would be sufficient

build Apex latest version failed with pytorch 1.4.0 due to missing ATen/cuda/DeviceUtils.cuh

PyTorch recently removed THCDeviceUtils.cuh recently thus we needed the change you mentioned.

build Apex latest version failed with pytorch 1.4.0 due to missing ATen/cuda/DeviceUtils.cuh

picking up a commit before #1191 may work -- https://github.com/NVIDIA/apex/commits/master

ibucc_tl_cuda.so: undefined symbol: nvmlDeviceGetNvLinkRemoteDeviceType

We're seeing the undefined symbol message when we run a container which has CUDA 11.6 on a host with an older driver

ibucc_tl_cuda.so: undefined symbol: nvmlDeviceGetNvLinkRemoteDeviceType

@bureddy what do you think about @zasdfgbnm's 2nd question? > Or at least detect the version and throw a kinder error message?

CUDA Illegal memory access on CrossEntropyLoss with large batch size, cu113, torch 1.12.1

thank you for the reproducible script. I repro'd with a recent source build and got the following with `TORCH_CPP_SHOW_STACKTRACES=1`. I'm looking into `nll_loss_forward_out_cuda_template`. ```console File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 3020, in cross_entropy...

Use `int64_t` for nll_loss with cuda inputs

> The implementation of this monster of a function is also located at `NLLLoss2d.cu`. Could you apply similar changes to that branch, and add the relevant tests to `test_nn.py`? I...

Use `int64_t` for nll_loss with cuda inputs

> Also, even better it may be to simply use `index_t` everywhere, as it should be the right size I think. I don't think it'll work unless we start refactoring...