ColossalAI
ColossalAI copied to clipboard
[BUG]: PyTorch version mistach ?so i updata pytorch or cuda ?
🐛 Describe the bug
(ColossalAI-Chat) tt@visiondev-SYS-4029GP-TRT:/data3/samba_css/chatgpt/ColossalAI/applications/Chat/examples$ colossalai check -i /home/tt/anaconda3/envs/ColossalAI-Chat/lib/python3.10/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::eye.m_out(int n, int m, *, Tensor(a!) out) -> Tensor(a!) registered at /opt/conda/conda-bld/pytorch_1670525552843/work/build/aten/src/ATen/RegisterSchema.cpp:6 dispatch key: Meta previous kernel: registered at /opt/conda/conda-bld/pytorch_1670525552843/work/build/aten/src/ATen/RegisterCPU.cpp:30798 new kernel: registered at /dev/null:241 (Triggered internally at /opt/conda/conda-bld/pytorch_1670525552843/work/aten/src/ATen/core/dispatch/OperatorEntry.cpp:150.) self.m.impl(name, dispatch_key, fn) /home/tt/anaconda3/envs/ColossalAI-Chat/lib/python3.10/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::index.Tensor(Tensor self, Tensor?[] indices) -> Tensor registered at /opt/conda/conda-bld/pytorch_1670525552843/work/build/aten/src/ATen/RegisterSchema.cpp:6 dispatch key: Meta previous kernel: registered at /opt/conda/conda-bld/pytorch_1670525552843/work/aten/src/ATen/functorch/BatchRulesScatterOps.cpp:1053 new kernel: registered at /dev/null:241 (Triggered internally at /opt/conda/conda-bld/pytorch_1670525552843/work/aten/src/ATen/core/dispatch/OperatorEntry.cpp:150.) self.m.impl(name, dispatch_key, fn)
Installation Report
------------ Environment ------------ Colossal-AI version: 0.2.8 PyTorch version: 1.13.1 System CUDA version: 11.4 CUDA version required by PyTorch: 11.6
Note:
- The table above checks the versions of the libraries/tools in the current environment
- If the System CUDA version is N/A, you can set the CUDA_HOME environment variable to locate it
- If the CUDA version required by PyTorch is N/A, you probably did not install a CUDA-compatible PyTorch. This value is give by torch.version.cuda and you can go to https://pytorch.org/get-started/locally/ to download the correct version.
------------ CUDA Extensions AOT Compilation ------------ Found AOT CUDA Extension: x PyTorch version used for AOT compilation: N/A CUDA version used for AOT compilation: N/A
Note:
- AOT (ahead-of-time) compilation of the CUDA kernels occurs during installation when the environment varialbe CUDA_EXT=1 is set
- If AOT compilation is not enabled, stay calm as the CUDA kernels can still be built during runtime
------------ Compatibility ------------ PyTorch version match: N/A System and PyTorch CUDA version match: x System and Colossal-AI CUDA version match: N/A
Environment
Colossal-AI version: 0.2.8 PyTorch version: 1.13.1 System CUDA version: 11.4 CUDA version required by PyTorch: 11.6
Hi @chensisi0730 , nvcc -V
and python -c "import torch; print(torch.version.cuda)"
should give a consistent version. In your case, installing cuda 11.6 in your own virtual environment like https://anaconda.org/nvidia/cuda-toolkit may fix the problem.
Hi @chensisi0730 ,
nvcc -V
andpython -c "import torch; print(torch.version.cuda)"
should give a consistent version. In your case, installing cuda 11.6 in your own virtual environment like https://anaconda.org/nvidia/cuda-toolkit may fix the problem.
here is my colossalai check -i output, is there anythin wrong? cause I encourter "ModuleNotFoundError: No module named 'colossalai.kernel.op_builder'" when i run the example code