aimet
aimet copied to clipboard
Undefined symbol imports on torch 1.12
Overview:
Added AIMET to a new python based training environment with includes torch 1.12, AIMET fails to import on undefined symbols
import aimet_common.AimetTensorQuantizer as AimetTensorQuantizer
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: /usr/local/lib/python3.8/site-packages/aimet_common/AimetTensorQuantizer.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor8data_ptrIfEEPT_v
Context:
Previously had users training on torch 1.9 and AIMET 1.22.0. I've been moving pieces of their workflows into environment with torch 1.12 and while moving AIMET pieces into upgraded containers the symbol definitions broke.
Requirements being used:
- CUDA 11.0
- Ubuntu 18.04
- Torch 1.12
- AIMET 1.22
- Torch vision 0.13
AIMET installation in container
We use the no-deps here as we had some other mismatching deps that pip did not like but did not break anything. Because of packaging constraints we had to no-deps them.
# Install AIMET
RUN python3 -m pip install --no-deps "https://github.com/quic/aimet/releases/download/1.23.0/AimetCommon-torch_gpu_1.23.0-cp38-cp38-linux_x86_64.whl"
RUN python3 -m pip install --no-deps "https://github.com/quic/aimet/releases/download/1.23.0/AimetTorch-torch_gpu_1.23.0-cp38-cp38-linux_x86_64.whl"
RUN python3 -m pip install "https://github.com/quic/aimet/releases/download/1.23.0/Aimet-torch_gpu_1.23.0-cp38-cp38-linux_x86_64.whl"
RUN cat /usr/local/lib/python3.8/site-packages/aimet_common/bin/reqs_deb_common.txt | apt-get --assume-yes install
RUN cat /usr/local/lib/python3.8/site-packages/aimet_torch/bin/reqs_deb_torch_gpu.txt | apt-get --assume-yes install
RUN ln -s /usr/lib/x86_64-linux-gnu/libjpeg.so /usr/lib
RUN ln -s /usr/local/cuda-11.4 /usr/local/cuda
RUN ln -s /usr/local/cuda-11.0 /usr/local/cuda
ENV PYTHONPATH=/usr/local/lib/python3.8/site-packages/aimet_torch:$PYTHONPATH
ENV LD_LIBRARY_PATH=/usr/local/lib/python3.8/site-packages/aimet_common:$LD_LIBRARY_PATH
RUN source /usr/local/lib/python3.8/site-packages/aimet_common/bin/envsetup.sh
# Install AIMET required libs not included in install
RUN apt-get install -y liblapacke.so.3
Debug Results:
- I regressed all environments back to the previous state and the issue resolved
- I then upgraded just torch from 1.09 to 1.12, with the same cuda and other libs
- The issue then came back, undefined symbol in the file AimetTensorQuantizer
It looks like there is a reliance on a symbol that was removed in one of the newer torch versions.
For internal use purposes I cannot regress my torch deps back to 1.9 to fix this, I would prefer if there was a solution from your side.
@quic-bharathr could you help answer this. Thanks
Hello @isaak-willett Sorry for delayed response. You need exact torch 1.9.1 as AimetTensorQuantizer is tightly coupled with torch version as it dynamically loads signature from libtorch 1.9.1. But, we are in the process of upgrading torch version to 1.13.1 and a release will be soon created with right installation instructions. Meanwhile, if you are able to build AIMET, you can use the tip to test out AIMET with PyTorch 1.12 and give us feedback.
When will torch1.13 be supported?
Hi @666DZY666 it should be supported hopefully soon, but we do not have an ETA yet.
Hi @666DZY666 it should be supported hopefully soon, but we do not have an ETA yet.
Thanks, looking forward to support, 1.9.1 is prone to incompatibility with task code.
Hi @666DZY666 it should be supported hopefully soon, but we do not have an ETA yet.
Hi @666DZY666 it should be supported hopefully soon, but we do not have an ETA yet.
We are in desperate need of an upgrade to >= torch1.10
I have the same error. May I ask when Aimet 1.25.0 can adapt to Torch>=1.10?