torchtune icon indicating copy to clipboard operation
torchtune copied to clipboard

BUG: Installation broken on torchao

Open tginart opened this issue 1 year ago • 8 comments

Bug on installation:

(tginart-0001) aginart@ip-10-1-89-181:~/dev/fun_projects/generation_projects$ pip install torchao
Requirement already satisfied: torchao in /fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages (0.6.1)
(tginart-0001) aginart@ip-10-1-89-181:~/dev/fun_projects/generation_projects$ tune --help
Traceback (most recent call last):
  File "/fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torchtune/__init__.py", line 16, in <module>
    import torchao  # noqa
    ^^^^^^^^^^^^^^
  File "/fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torchao/__init__.py", line 1, in <module>
    import torch
  File "/fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torch/__init__.py", line 367, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: /fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/fsx/home/aginart/miniconda3/envs/tginart-0001/bin/tune", line 5, in <module>
    from torchtune._cli.tune import main
  File "/fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torchtune/__init__.py", line 18, in <module>
    raise ImportError(
ImportError: 
        torchao not installed.
        Please follow the instructions at https://pytorch.org/torchtune/main/install.html#pre-requisites
        to install torchao.

tginart avatar Nov 12 '24 02:11 tginart

You can try installing torchao with pip install torchao, do you still see the same error message after?

RdoubleA avatar Nov 12 '24 02:11 RdoubleA

Yes, but it goes away if I install pytorch version 2.3

tginart avatar Nov 12 '24 02:11 tginart

torch 2.3 is a bit old. It is possible that when you are installing your libraries, it is using a cached version. I recommend creating a fresh new environment and doing

pip install torch torchao torchvision

You may have better performance and access to newer features if you install nightlies. You can find instructions here: https://github.com/pytorch/torchtune#install-nightly-release

felipemello1 avatar Nov 12 '24 19:11 felipemello1

based on the other post, i believe this one is solved. If it isnt, please reopen the issue.

felipemello1 avatar Nov 13 '24 03:11 felipemello1

Same thing here, installation followed: https://pytorch.org/torchtune/main/install.html#pre-requisites

(tune) (base)$ tune 
Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/lib/python3.12/site-packages/torchtune/__init__.py", line 16, in <module>
    import torchao  # noqa
    ^^^^^^^^^^^^^^
  File "/home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/lib/python3.12/site-packages/torchao/__init__.py", line 1, in <module>
    import torch
  File "/home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/lib/python3.12/site-packages/torch/__init__.py", line 367, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: /home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: symbol __nvJitLinkComplete_12_4, version libnvJitLink.so.12 not defined in file libnvJitLink.so.12 with link time reference

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/bin/tune", line 5, in <module>
    from torchtune._cli.tune import main
  File "/home/ec2-user/SageMaker/finetuning/2/sagemaker-distributed-training-workshop/tune/lib/python3.12/site-packages/torchtune/__init__.py", line 18, in <module>
    raise ImportError(
ImportError: 
        torchao not installed.
        Please follow the instructions at https://pytorch.org/torchtune/main/install.html#pre-requisites
        to install torchao.

ssivanov-aws avatar Dec 12 '24 12:12 ssivanov-aws

export LD_LIBRARY_PATH=/opt/conda/lib/python3.10/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH

This solved the issue for me Check this comment to get the path: https://github.com/pytorch/pytorch/issues/111469#issuecomment-2080399764

tanyashourya avatar Dec 13 '24 00:12 tanyashourya

export LD_LIBRARY_PATH=/opt/conda/lib/python3.10/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH

This solved the issue for me Check this comment to get the path: pytorch/pytorch#111469 (comment)

But when I reopen the terminal, it stil occur, did we need to add it into .bashrc?

LukeLIN-web avatar Dec 13 '24 01:12 LukeLIN-web

I am getting a similar error after installing into NGC container with

torch=2.4.0a0+f70bd71a48.nv24.6 torchao=0.11.0 torchtune==0.6.1


thes@nid008232:/pscratch/sd/t/thes/jared/torchtune$ tune --help
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torchtune/__init__.py", line 16, in <module>
    import torchao  # noqa
  File "/usr/local/lib/python3.10/dist-packages/torchao/__init__.py", line 41, in <module>
    from torchao.quantization import (
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/__init__.py", line 1, in <module>
    from torchao.kernel import (
  File "/usr/local/lib/python3.10/dist-packages/torchao/kernel/__init__.py", line 1, in <module>
    from torchao.kernel.bsr_triton_ops import bsr_dense_addmm
  File "/usr/local/lib/python3.10/dist-packages/torchao/kernel/bsr_triton_ops.py", line 16, in <module>
    from torch._dynamo.utils import warn_once
ImportError: cannot import name 'warn_once' from 'torch._dynamo.utils' (/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/tune", line 5, in <module>
    from torchtune._cli.tune import main
  File "/usr/local/lib/python3.10/dist-packages/torchtune/__init__.py", line 18, in <module>
    raise ImportError(
ImportError:
        torchao not installed.
        Please follow the instructions at https://pytorch.org/torchtune/main/install.html#pre-requisites
        to install torchao.

editing LD_LIBRARY_PATH and unsetting it do not fix the issue

jdwillard19 avatar Jun 17 '25 15:06 jdwillard19