pytorch icon indicating copy to clipboard operation
pytorch copied to clipboard

Loading of libamdhip64.so.7 fails when release/2.7 branch is build by TheRock using rocm 7.0

Open lamikr opened this issue 5 months ago • 1 comments

🐛 Describe the bug

When we are doing the ci-build or pytorch and then install it in order to build the pytorch vision and audio we get error for loading libamdhip64.so.7 version.

Fix is available in the main branch from pr https://github.com/pytorch/pytorch/pull/158889 and I have tested that it fixes a following build error when I backport it to release/2.7 branch:


2025-07-24T19:49:47.6864980Z Successfully installed torch-2.7.1+rocm7.0.0.dev0.515115ea2cb85a0b71b5507ce56a627d14c7ae73
2025-07-24T19:49:48.0371194Z Traceback (most recent call last):
2025-07-24T19:49:48.0372026Z File "/__w/TheRock/TheRock/external-builds/pytorch/pytorch_audio/setup.py", line 9, in
2025-07-24T19:49:48.0373089Z import torch
2025-07-24T19:49:48.0373755Z File "/opt/python/cp311-cp311/lib/python3.11/site-packages/torch/init.py", line 424, in
2025-07-24T19:49:48.0374545Z from torch._C import * # noqa: F403
2025-07-24T19:49:48.0374948Z ^^^^^^^^^^^^^^^^^^^^^^
2025-07-24T19:49:48.0375552Z ImportError: libamdhip64.so.7: cannot open shared object file: No such file or directory
2025-07-24T19:49:48.0822449Z Traceback (most recent call last):
2025-07-24T19:49:48.0823527Z ++ Exec [/__w/TheRock/TheRock]$ /opt/python/cp311-cp311/bin/python -m pip cache remove rocm_sdk --cache-dir /tmp/pipcache
2025-07-24T19:49:48.0824910Z File "/__w/TheRock/TheRock/./external-builds/pytorch/build_prod_wheels.py", line 794, in
2025-07-24T19:49:48.0825793Z main(sys.argv[1:])
2025-07-24T19:49:48.0826534Z File "/__w/TheRock/TheRock/./external-builds/pytorch/build_prod_wheels.py", line 790, in main
2025-07-24T19:49:48.0828941Z ++ Exec [/__w/TheRock/TheRock]$ /opt/python/cp311-cp311/bin/python -m pip install --force-reinstall --pre --index-url https://d25kgig7rdsyks.cloudfront.net/v2/gfx94X-dcgpu/ --cache-dir /tmp/pipcache --cache-dir /tmp/pipcache 'rocm[libraries,devel]==7.0.0.dev0+515115ea2cb85a0b71b5507ce56a627d14c7ae73'
2025-07-24T19:49:48.0831487Z Installed version: 7.0.0.dev0+515115ea2cb85a0b71b5507ce56a627d14c7ae73

Only thing I needed to drop from the original patch was the sha256 checksum change for the aotriton 0.9.

Versions

Not relevant, error happens on ci-machine.

lamikr avatar Jul 25 '25 02:07 lamikr

PR https://github.com/ROCm/pytorch/pull/2412 fixes the build problems of pytorch vision and audio which required the pytorch that was build first to be installed.

I tested by using same changes as a patch on test build: https://github.com/ROCm/TheRock/actions/runs/16510870730/job/46692459422

lamikr avatar Jul 25 '25 02:07 lamikr