Failing to compile from master on ec2 instances.
Description
Failing to build Faster Transformer from master locally. It throw error on missing nccl, running on ec2 instances both P3 and P4d.
### Environment
PyTorch version: 1.11.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.22.4
Libc version: glibc-2.27
Python version: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18) [GCC 10.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-1072-aws-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.3.58
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB
GPU 2: NVIDIA A100-SXM4-40GB
GPU 3: NVIDIA A100-SXM4-40GB
GPU 4: NVIDIA A100-SXM4-40GB
GPU 5: NVIDIA A100-SXM4-40GB
GPU 6: NVIDIA A100-SXM4-40GB
GPU 7: NVIDIA A100-SXM4-40GB
Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.22.4
[pip3] torch==1.11.0+cu113
[pip3] torchaudio==0.11.0+cu113
[pip3] torchvision==0.12.0+cu113
[conda] numpy 1.22.4 pypi_0 pypi
[conda] torch 1.11.0+cu113 pypi_0 pypi
[conda] torchaudio 0.11.0+cu113 pypi_0 pypi
[conda] torchvision 0.12.0+cu113 pypi_0 pypi
Error
/usr/bin/ld: cannot find -lnccl
collect2: error: ld returned 1 exit status
src/fastertransformer/th_op/gpt/CMakeFiles/th_gpt.dir/build.make:160: recipe for target 'lib/libth_gpt.so' failed
make[2]: *** [lib/libth_gpt.so] Error 1
CMakeFiles/Makefile2:5082: recipe for target 'src/fastertransformer/th_op/gpt/CMakeFiles/th_gpt.dir/all' failed
make[1]: *** [src/fastertransformer/th_op/gpt/CMakeFiles/th_gpt.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2
Nccl version check
(fresh_env) ubuntu@ip-172-31-48-37:~/FT_clean/FasterTransformer/build$ python -c "import torch;print(torch.cuda.nccl.version())"
(2, 10, 3)
I wonder if I am missing any step here.
Reproduced Steps
conda create -n my-env python=3.8
conda activate my-env
1. pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
2. git clone https://github.com/NVIDIA/FasterTransformer.git
3. vi ~/.bashrc
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export PATH=$PATH:$CUDA_HOME/bin
CUDACXX=/usr/local/cuda-11.3/bin/nvcc
4. cd FasterTransformer
5. mkdir -p build
6. cd build
7. cmake -DSM=80 -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON ..
8. make
Can you try the docker we suggest in the document?
Besides, do you use main branch? I remember we don't need nccl for gpt module.
Can you try the docker we suggest in the document?
I am using the docker as the workaround but looking into building locally as well.
Besides, do you use main branch? I remember we don't need nccl for gpt module.
yes, I am using the main branch.
I wonder if docker file is accessible on OSS?
The dockers we use in the document are open on NGC.
can you please point me to the repo to access the Dockerfile, it can be helpful to mimic the env.
We don't have Dockerfile. We use the docker image of NGC like nvcr.io/nvidia/pytorch:22.03-py3 directly.
Yes, thanks for clarification, I am using the docker image, just thought it might be possible to access Dockerfile on OSS as well.
I would appreciate if you have any suggestion for debugging.
The docker is open in NGC, you could pull it directly.
Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.