Masaki Kozuki

Results 167 comments of Masaki Kozuki

- ./tests/L0/run_transformer/test_random.py - ./tests/L0/run_fused_layer_norm/test_fused_layer_norm.py

unfortunately pyprof in apex doesn't seem maintained well and according to https://github.com/NVIDIA/PyProf#pyprof---pytorch-profiling-tool, I guess it's better to use [dlprof](https://docs.nvidia.com/deeplearning/frameworks/dlprof-user-guide/)

in apex, the version check against NGC PyTorch containers (e.g. 22.08) and the released PyTorch (e.g. 1.12.1) would be sufficient

PyTorch recently removed THCDeviceUtils.cuh recently thus we needed the change you mentioned.

picking up a commit before #1191 may work -- https://github.com/NVIDIA/apex/commits/master

We're seeing the undefined symbol message when we run a container which has CUDA 11.6 on a host with an older driver

@bureddy what do you think about @zasdfgbnm's 2nd question? > Or at least detect the version and throw a kinder error message?

thank you for the reproducible script. I repro'd with a recent source build and got the following with `TORCH_CPP_SHOW_STACKTRACES=1`. I'm looking into `nll_loss_forward_out_cuda_template`. ```console File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 3020, in cross_entropy...

> The implementation of this monster of a function is also located at `NLLLoss2d.cu`. Could you apply similar changes to that branch, and add the relevant tests to `test_nn.py`? I...

> Also, even better it may be to simply use `index_t` everywhere, as it should be the right size I think. I don't think it'll work unless we start refactoring...