BUG: Installation broken on torchao
Bug on installation:
(tginart-0001) aginart@ip-10-1-89-181:~/dev/fun_projects/generation_projects$ pip install torchao
Requirement already satisfied: torchao in /fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages (0.6.1)
(tginart-0001) aginart@ip-10-1-89-181:~/dev/fun_projects/generation_projects$ tune --help
Traceback (most recent call last):
File "/fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torchtune/__init__.py", line 16, in <module>
import torchao # noqa
^^^^^^^^^^^^^^
File "/fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torchao/__init__.py", line 1, in <module>
import torch
File "/fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torch/__init__.py", line 367, in <module>
from torch._C import * # noqa: F403
^^^^^^^^^^^^^^^^^^^^^^
ImportError: /fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/fsx/home/aginart/miniconda3/envs/tginart-0001/bin/tune", line 5, in <module>
from torchtune._cli.tune import main
File "/fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torchtune/__init__.py", line 18, in <module>
raise ImportError(
ImportError:
torchao not installed.
Please follow the instructions at https://pytorch.org/torchtune/main/install.html#pre-requisites
to install torchao.
You can try installing torchao with pip install torchao, do you still see the same error message after?
Yes, but it goes away if I install pytorch version 2.3
torch 2.3 is a bit old. It is possible that when you are installing your libraries, it is using a cached version. I recommend creating a fresh new environment and doing
pip install torch torchao torchvision
You may have better performance and access to newer features if you install nightlies. You can find instructions here: https://github.com/pytorch/torchtune#install-nightly-release
based on the other post, i believe this one is solved. If it isnt, please reopen the issue.
Same thing here, installation followed: https://pytorch.org/torchtune/main/install.html#pre-requisites
(tune) (base)$ tune
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/lib/python3.12/site-packages/torchtune/__init__.py", line 16, in <module>
import torchao # noqa
^^^^^^^^^^^^^^
File "/home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/lib/python3.12/site-packages/torchao/__init__.py", line 1, in <module>
import torch
File "/home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/lib/python3.12/site-packages/torch/__init__.py", line 367, in <module>
from torch._C import * # noqa: F403
^^^^^^^^^^^^^^^^^^^^^^
ImportError: /home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: symbol __nvJitLinkComplete_12_4, version libnvJitLink.so.12 not defined in file libnvJitLink.so.12 with link time reference
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/bin/tune", line 5, in <module>
from torchtune._cli.tune import main
File "/home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/lib/python3.12/site-packages/torchtune/__init__.py", line 18, in <module>
raise ImportError(
ImportError:
torchao not installed.
Please follow the instructions at https://pytorch.org/torchtune/main/install.html#pre-requisites
to install torchao.
export LD_LIBRARY_PATH=/opt/conda/lib/python3.10/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH
This solved the issue for me Check this comment to get the path: https://github.com/pytorch/pytorch/issues/111469#issuecomment-2080399764
export LD_LIBRARY_PATH=/opt/conda/lib/python3.10/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATHThis solved the issue for me Check this comment to get the path: pytorch/pytorch#111469 (comment)
But when I reopen the terminal, it stil occur, did we need to add it into .bashrc?
I am getting a similar error after installing into NGC container with
torch=2.4.0a0+f70bd71a48.nv24.6 torchao=0.11.0 torchtune==0.6.1
thes@nid008232:/pscratch/sd/t/thes/jared/torchtune$ tune --help
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torchtune/__init__.py", line 16, in <module>
import torchao # noqa
File "/usr/local/lib/python3.10/dist-packages/torchao/__init__.py", line 41, in <module>
from torchao.quantization import (
File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/__init__.py", line 1, in <module>
from torchao.kernel import (
File "/usr/local/lib/python3.10/dist-packages/torchao/kernel/__init__.py", line 1, in <module>
from torchao.kernel.bsr_triton_ops import bsr_dense_addmm
File "/usr/local/lib/python3.10/dist-packages/torchao/kernel/bsr_triton_ops.py", line 16, in <module>
from torch._dynamo.utils import warn_once
ImportError: cannot import name 'warn_once' from 'torch._dynamo.utils' (/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/tune", line 5, in <module>
from torchtune._cli.tune import main
File "/usr/local/lib/python3.10/dist-packages/torchtune/__init__.py", line 18, in <module>
raise ImportError(
ImportError:
torchao not installed.
Please follow the instructions at https://pytorch.org/torchtune/main/install.html#pre-requisites
to install torchao.
editing LD_LIBRARY_PATH and unsetting it do not fix the issue