alpa
alpa copied to clipboard
Problem in building Alpa-modified Jaxlib.
Please describe the bug
Please describe the expected behavior
System information and environment
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04, docker):
- Python version: 3.9
- CUDA version:11.3
- NCCL version: 8.2.0.53
- cupy version: cupy-cuda11x 12.2.0
- GPU : GeForce RTX3090
- Alpa version: 0.2.3
- JAX version: 0.3.22
To Reproduce
Steps to reproduce the behavior:
When I try to install alpa from source, and execute
python3 build/build.py --enable_cuda --dev_install --bazel_options=--override_repository=org_tensorflow=$(pwd)/../third_party/tensorflow-alpa
, some warnings happened.
And I don't know if it's related to the error happened in the second pic.
Screenshots
If applicable, add screenshots to help explain your problem.
Code snippet to reproduce the problem
Additional information Add any other context about the problem here or include any logs that would be helpful to diagnose the problem.
this bug caused by wrong version of libnccl i solved it by reinstalling a right ver libnccl and recreating a new python env based on this libnccl
this bug caused by wrong version of libnccl i solved it by reinstalling a right ver libnccl and recreating a new python env based on this libnccl
may i ask your concrete version of python and libnccl, thx
yeah python == 3.8.13 gcc == 7.5.0 nccl == libnccl.so.2.8.4
Hi, I am running into the same issue when building from source. I don't understand how libnccl version affects the filenotfound error? Any other solution to this?
Hi, I am running into the same issue when building from source. I don't understand how libnccl version affects the filenotfound error? Any other solution to this?
the mirror url is write in some workplace file. it seems the file not found
problem not the error reason. the incorrect libnccl version is the main cause.