HIP "Undefined __global__ function" in PyTorch

I'm experimenting with building PyTorch against ROCm, which in turn built with ebuilds from here on Gentoo. I realize, that this is completely unsupported configuration, but could you please suggest me, where to find the inconsistency in the build with the following problem. The problem is that every call to pretty much any function I tried in torch.nn.functional fails with RuntimeError: Undefined __global__ function.. Minimal reproducing code is just:

import torch
import torch.nn.functional as F

x = torch.tensor([1, 2], device="cuda")
v = F.relu(x)

I've found some similar error output in rocFFT issue, but can't really tell if it's connected to rocFFT or not.

Thanks in advance for any help or information!

Jan 28 '20 22:01 aclex

This would be better filed in ROCmSoftwarePlatform/pytorch.

Can you comment how you built PyTorch exactly - indeed it is unsupported on gentoo and the PT build infrastructure is rather complex so I suspect some issue there.

Jan 30 '20 23:01 iotamudelta

@iotamudelta thanks for your attention and suggestions! Yes, I'll file a question there as well with the reference here.

As for the building, I pretty much replicate the building process in .jenkins directory for AMD way, but compiling C++ part first with CMake and then Python part on top of it, following this ebuild. So I'm just trying to approach the problem, at least find a way to compile the kernel manually.

I also tend to think the problem is in my PyTorch build, rather than in ROCm parts builds, as, for example, this test example works fine.

Anyway, thank you very much for your help.

Jan 30 '20 23:01 aclex

OK, that's a bit hard for me to map to the way we do things. So some general questions. You seem to supply USE_ROCM=1 to the cmake parts - that's correct. Do you also make sure to invoke hipification prior? Do you supply USE_ROCM=1 to the setup.py?

Jan 31 '20 00:01 iotamudelta

Yes, in case of ROCm build I both set USE_ROCM=1 and perform tools/amd_build/build_amd.py script to do hipification. It builds quite fine, but I'm not sure about passing USE_ROCM=1 to setup.py afterwards — will double-check it to be sure. The building process itself finishes successfully, the problem is only on runtime.

Jan 31 '20 06:01 aclex

@aclex, do you still see the issue with the latest Rocm?

Jun 29 '22 09:06 SarbojitAMD

Can't confirm it for 5.0, haven't built PyTorch against it yet, unfortunately. Feel free to close it for now, I'll reopen if it's still there.

Jun 29 '22 15:06 aclex

Closing. Please re-open if it occurs with latest ROCm 6.0.2 (HIP 6.0.32831)

Mar 18 '24 19:03 ppanchad-amd

HIP
HIP copied to clipboard

"Undefined global function" in PyTorch

HIP HIP copied to clipboard

"Undefined __global__ function" in PyTorch

HIP
HIP copied to clipboard

"Undefined global function" in PyTorch