aimet icon indicating copy to clipboard operation
aimet copied to clipboard

Undefined symbol imports on torch 1.12

Open pickles-bread-and-butter opened this issue 2 years ago • 7 comments

Overview:

Added AIMET to a new python based training environment with includes torch 1.12, AIMET fails to import on undefined symbols import aimet_common.AimetTensorQuantizer as AimetTensorQuantizer

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: /usr/local/lib/python3.8/site-packages/aimet_common/AimetTensorQuantizer.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor8data_ptrIfEEPT_v

Context:

Previously had users training on torch 1.9 and AIMET 1.22.0. I've been moving pieces of their workflows into environment with torch 1.12 and while moving AIMET pieces into upgraded containers the symbol definitions broke.

Requirements being used:

  • CUDA 11.0
  • Ubuntu 18.04
  • Torch 1.12
  • AIMET 1.22
  • Torch vision 0.13

AIMET installation in container

We use the no-deps here as we had some other mismatching deps that pip did not like but did not break anything. Because of packaging constraints we had to no-deps them.

# Install AIMET
RUN python3 -m pip install --no-deps "https://github.com/quic/aimet/releases/download/1.23.0/AimetCommon-torch_gpu_1.23.0-cp38-cp38-linux_x86_64.whl"
RUN python3 -m pip install --no-deps "https://github.com/quic/aimet/releases/download/1.23.0/AimetTorch-torch_gpu_1.23.0-cp38-cp38-linux_x86_64.whl"
RUN python3 -m pip install "https://github.com/quic/aimet/releases/download/1.23.0/Aimet-torch_gpu_1.23.0-cp38-cp38-linux_x86_64.whl"

RUN cat /usr/local/lib/python3.8/site-packages/aimet_common/bin/reqs_deb_common.txt | apt-get --assume-yes install
RUN cat /usr/local/lib/python3.8/site-packages/aimet_torch/bin/reqs_deb_torch_gpu.txt | apt-get --assume-yes install

RUN ln -s /usr/lib/x86_64-linux-gnu/libjpeg.so /usr/lib
RUN ln -s /usr/local/cuda-11.4 /usr/local/cuda
RUN ln -s /usr/local/cuda-11.0 /usr/local/cuda

ENV PYTHONPATH=/usr/local/lib/python3.8/site-packages/aimet_torch:$PYTHONPATH
ENV LD_LIBRARY_PATH=/usr/local/lib/python3.8/site-packages/aimet_common:$LD_LIBRARY_PATH
RUN source /usr/local/lib/python3.8/site-packages/aimet_common/bin/envsetup.sh

# Install AIMET required libs not included in install
RUN apt-get install -y liblapacke.so.3

Debug Results:

  1. I regressed all environments back to the previous state and the issue resolved
  2. I then upgraded just torch from 1.09 to 1.12, with the same cuda and other libs

It looks like there is a reliance on a symbol that was removed in one of the newer torch versions.

For internal use purposes I cannot regress my torch deps back to 1.9 to fix this, I would prefer if there was a solution from your side.

@quic-bharathr could you help answer this. Thanks

quic-mangal avatar Mar 23 '23 23:03 quic-mangal

Hello @isaak-willett Sorry for delayed response. You need exact torch 1.9.1 as AimetTensorQuantizer is tightly coupled with torch version as it dynamically loads signature from libtorch 1.9.1. But, we are in the process of upgrading torch version to 1.13.1 and a release will be soon created with right installation instructions. Meanwhile, if you are able to build AIMET, you can use the tip to test out AIMET with PyTorch 1.12 and give us feedback.

quic-hitameht avatar Mar 29 '23 08:03 quic-hitameht

When will torch1.13 be supported?

666DZY666 avatar May 01 '23 10:05 666DZY666

Hi @666DZY666 it should be supported hopefully soon, but we do not have an ETA yet.

quic-bharathr avatar May 01 '23 22:05 quic-bharathr

Hi @666DZY666 it should be supported hopefully soon, but we do not have an ETA yet.

Thanks, looking forward to support, 1.9.1 is prone to incompatibility with task code.

666DZY666 avatar May 02 '23 09:05 666DZY666

Hi @666DZY666 it should be supported hopefully soon, but we do not have an ETA yet.

Hi @666DZY666 it should be supported hopefully soon, but we do not have an ETA yet.

We are in desperate need of an upgrade to >= torch1.10

WithFoxSquirrel avatar Jun 15 '23 08:06 WithFoxSquirrel

I have the same error. May I ask when Aimet 1.25.0 can adapt to Torch>=1.10?

Tom-plus avatar Dec 13 '23 07:12 Tom-plus