ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: PyTorch version mistach ?so i updata pytorch or cuda ?

Open chensisi0730 opened this issue 1 year ago • 1 comments

🐛 Describe the bug

(ColossalAI-Chat) tt@visiondev-SYS-4029GP-TRT:/data3/samba_css/chatgpt/ColossalAI/applications/Chat/examples$ colossalai check -i /home/tt/anaconda3/envs/ColossalAI-Chat/lib/python3.10/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::eye.m_out(int n, int m, *, Tensor(a!) out) -> Tensor(a!) registered at /opt/conda/conda-bld/pytorch_1670525552843/work/build/aten/src/ATen/RegisterSchema.cpp:6 dispatch key: Meta previous kernel: registered at /opt/conda/conda-bld/pytorch_1670525552843/work/build/aten/src/ATen/RegisterCPU.cpp:30798 new kernel: registered at /dev/null:241 (Triggered internally at /opt/conda/conda-bld/pytorch_1670525552843/work/aten/src/ATen/core/dispatch/OperatorEntry.cpp:150.) self.m.impl(name, dispatch_key, fn) /home/tt/anaconda3/envs/ColossalAI-Chat/lib/python3.10/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::index.Tensor(Tensor self, Tensor?[] indices) -> Tensor registered at /opt/conda/conda-bld/pytorch_1670525552843/work/build/aten/src/ATen/RegisterSchema.cpp:6 dispatch key: Meta previous kernel: registered at /opt/conda/conda-bld/pytorch_1670525552843/work/aten/src/ATen/functorch/BatchRulesScatterOps.cpp:1053 new kernel: registered at /dev/null:241 (Triggered internally at /opt/conda/conda-bld/pytorch_1670525552843/work/aten/src/ATen/core/dispatch/OperatorEntry.cpp:150.) self.m.impl(name, dispatch_key, fn)

Installation Report

------------ Environment ------------ Colossal-AI version: 0.2.8 PyTorch version: 1.13.1 System CUDA version: 11.4 CUDA version required by PyTorch: 11.6

Note:

  1. The table above checks the versions of the libraries/tools in the current environment
  2. If the System CUDA version is N/A, you can set the CUDA_HOME environment variable to locate it
  3. If the CUDA version required by PyTorch is N/A, you probably did not install a CUDA-compatible PyTorch. This value is give by torch.version.cuda and you can go to https://pytorch.org/get-started/locally/ to download the correct version.

------------ CUDA Extensions AOT Compilation ------------ Found AOT CUDA Extension: x PyTorch version used for AOT compilation: N/A CUDA version used for AOT compilation: N/A

Note:

  1. AOT (ahead-of-time) compilation of the CUDA kernels occurs during installation when the environment varialbe CUDA_EXT=1 is set
  2. If AOT compilation is not enabled, stay calm as the CUDA kernels can still be built during runtime

------------ Compatibility ------------ PyTorch version match: N/A System and PyTorch CUDA version match: x System and Colossal-AI CUDA version match: N/A

Environment

Colossal-AI version: 0.2.8 PyTorch version: 1.13.1 System CUDA version: 11.4 CUDA version required by PyTorch: 11.6

chensisi0730 avatar Apr 24 '23 06:04 chensisi0730

Hi @chensisi0730 , nvcc -V and python -c "import torch; print(torch.version.cuda)" should give a consistent version. In your case, installing cuda 11.6 in your own virtual environment like https://anaconda.org/nvidia/cuda-toolkit may fix the problem.

kurisusnowdeng avatar Apr 24 '23 09:04 kurisusnowdeng

Hi @chensisi0730 , nvcc -V and python -c "import torch; print(torch.version.cuda)" should give a consistent version. In your case, installing cuda 11.6 in your own virtual environment like https://anaconda.org/nvidia/cuda-toolkit may fix the problem.

here is my colossalai check -i output, is there anythin wrong? cause I encourter "ModuleNotFoundError: No module named 'colossalai.kernel.op_builder'" when i run the example code image

Ke51n avatar May 06 '23 09:05 Ke51n