apex icon indicating copy to clipboard operation
apex copied to clipboard

ModuleNotFoundError: No module named 'fused_layer_norm_cuda', ubuntu 22.04, Successfully installed apex-0.1

Open gusevmaksim opened this issue 2 years ago • 2 comments

Describe the Bug ModuleNotFoundError: No module named 'fused_layer_norm_cuda', apex installed (pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--fast_layer_norm" ./)

When compiling, it was required to add to the files: apex/contrib/csrc/layer_norm/ln.h -> add #include apex/contrib/csrc/layer_norm/ln_utils.cuh -> add #include <stdio.h> #include <stdlib.h>

Successfully installed apex-0.1

Minimal Steps/Code to Reproduce the Bug from transformers import T5ForConditionalGeneration model = T5ForConditionalGeneration.from_pretrained("t5-small")

Expected Behavior No error ModuleNotFoundError: No module named 'fused_layer_norm_cuda'

Environment UBUNTU 22.04 CUDA Version: 11.7, pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 apex-0.1

python3 -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.12.1+cu116 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.1 LTS (x86_64) GCC version: (Ubuntu 11.2.0-19ubuntu1) 11.2.0 Clang version: Could not collect CMake version: version 3.22.1 Libc version: glibc-2.35

Python version: 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.0-47-generic-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 Nvidia driver version: 515.65.01 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.5.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.23.3 [pip3] torch==1.12.1+cu116 [pip3] torchaudio==0.12.1+cu116 [pip3] torchvision==0.13.1+cu116 [conda] Could not collect

gusevmaksim avatar Sep 15 '22 15:09 gusevmaksim

Could you try installing apex with --global-option="--cuda_ext" option as well? fused_layer_norm_cuda would not be installed with "--fast_layer_norm" option but "--cuda_ext".

crcrpar avatar Sep 15 '22 18:09 crcrpar

Thanks, it helped. Previously, I always did without this parameter and it worked. In the script, you have to comment out the line that swears at the version.

gusevmaksim avatar Sep 16 '22 05:09 gusevmaksim

Any chance to get this included in master?

salanki avatar Oct 29 '22 16:10 salanki