pocl segmentation fault in clGetDeviceIDs on arm64(Jetson AGX)

Hi,

after successfully compiled the pocl on Jetson AGX platform using LLVM 15, I have encountered a segmentation fault while calling the clGetDeviceIDs function. The compilation was successfully only after applying the patch mentioned in https://github.com/pocl/pocl/issues/1196

Trying to debugg, the issue, I could trace it till here:

Apr 13 '23 15:04 buni-rock

Hi. Generally the first step in debugging segfaults is usually to get a backtrace with the debug build. Does that show something more useful?

Apr 14 '23 06:04 pjaaskel

This seems like a different issue than #1196. Do you have a backtrace ? Screenshots aren't too useful, but i see one problem: dev->dispatch is 0x1 - not a valid pointer. Are you using an ICD (libOpenCL.so) ? Is it Nvidia's ICD ? i would not bet that Nvidia's ICD works with anything but Nvidia's own OpenCL runtime. Can you try with opensource OCL-ICD or Khronos' ICD, or build PoCL without ICD (-DENABLE_ICD=0) ?

Apr 14 '23 06:04 franz

Hi again,

ICD:

As stated in the documentation for OpenCL 3.0, ocl-icd (2.3.x) is necessary (which seems to no be available on my system)

Furthermore, clinfo reports the following

The crash I have reported earlier happens under the HOST_DEVICE_CL_VERSION_MAJOR >= 3 which would make sense according to the message reported by clinfo.

I have checked where the HOST_DEVICE_CL_VERSION_MAJOR is set, and it looks like it is somehow hardcoded to the compiler version: https://github.com/pocl/pocl/blob/release_3_1/CMakeLists.txt#L1187

On my side, I have installed LLVM 14 and 15; I am wondering if is possible to compile pocl using these compilers and not using OpenCL 3.0

I have built pocl with (-DENABLE_ICD=0) and afterwards clinfo outputs Number of platforms 0

May 11 '23 08:05 buni-rock

On my side, I have installed LLVM 14 and 15; I am wondering if is possible to compile pocl using these compilers and not using OpenCL 3.0

I think not with the current code, it is indeed hardcoded, however if you remove that code if(LLVM_VERSION VERSION_GREATER_EQUAL 14.0) and just hardcode it to 1.2, it should be possible to build.

May 30 '23 13:05 franz

Hi @buni-rock ,

Are you able to run POCL (opencl version 3.0) on Jetson AGX ? I am considering to purchase an Jetson Orin nano to running POCL, but not sure if it is viable, appreciate if you can advise

Jul 06 '23 02:07 heweiwill

If you are okay with using binaries, you can try the instructions at https://github.com/pocl/pocl#pocl-with-cuda-driver

Jul 06 '23 03:07 isuruf

Have this binary been verified work on any Jetson devices?

Jul 06 '23 03:07 heweiwill

I've tried it on an aarch64 server, but not a Jetson in particular. If it does not work there, I'll be happy to fix it.

Jul 06 '23 03:07 isuruf

Hi @buni-rock ,

Are you able to run POCL (opencl version 3.0) on Jetson AGX ? I am considering to purchase an Jetson Orin nano to running POCL, but not sure if it is viable, appreciate if you can advise

Hi,

on our side we had 2 Jetsons, the one I tried to use failed to run POCL out of the box. The other one, which my colleague used, could run POCL without much effort. On the second device we could build POCL from sources and install it successfully. In principle, you can run POCL on Jetson, but the questions would be: how much effort you have to put in in order to achieve it?

PS: regarding this issue, I am afraid I can't proceed further due to lack of time.

Jul 06 '23 05:07 buni-rock

@isuruf I followed the instruciton in https://github.com/pocl/pocl#pocl-with-cuda-driver, installed the pocl-cuda via mamba. But the clinfo still shows nothing. Do you have idea how could i utililize the pocl installed? BTW, i am still using an old jetson nano, not orin nano.

Jul 06 '23 06:07 heweiwill

Hi @buni-rock , Are you able to run POCL (opencl version 3.0) on Jetson AGX ? I am considering to purchase an Jetson Orin nano to running POCL, but not sure if it is viable, appreciate if you can advise

Hi,

on our side we had 2 Jetsons, the one I tried to use failed to run POCL out of the box. The other one, which my colleague used, could run POCL without much effort. On the second device we could build POCL from sources and install it successfully. In principle, you can run POCL on Jetson, but the questions would be: how much effort you have to put in in order to achieve it?

PS: regarding this issue, I am afraid I can't proceed further due to lack of time.

what are the differences between these two Jetson? why one works but the other one fails?

Jul 06 '23 06:07 heweiwill

I followed the instruciton in https://github.com/pocl/pocl#pocl-with-cuda-driver, installed the pocl-cuda via mamba. But the clinfo still shows nothing. Do you have idea how could i utililize the pocl installed?

You will see a pocl.icd file in ~/mambaforge/etc/OpenCL/vendors/pocl.icd. Copy that to /etc/OpenCL/vendors/pocl.icd. If you have the ICD loader from Khronos or ocl-dev, then you can set OCL_ICD_VENDORS=~/mambaforge/etc/OpenCL/vendors/pocl.icd instead of copying.

Jul 06 '23 06:07 isuruf

I followed the instruciton in https://github.com/pocl/pocl#pocl-with-cuda-driver, installed the pocl-cuda via mamba. But the clinfo still shows nothing. Do you have idea how could i utililize the pocl installed?

You will see a pocl.icd file in ~/mambaforge/etc/OpenCL/vendors/pocl.icd. Copy that to /etc/OpenCL/vendors/pocl.icd. If you have the ICD loader from Khronos or ocl-dev, then you can set OCL_ICD_VENDORS=~/mambaforge/etc/OpenCL/vendors/pocl.icd instead of copying.

Hi @isuruf yes, i got the clinfo output, but some error from that , do you have any idea of that? Screenshot from 2023-07-06 14-45-45

Jul 06 '23 06:07 heweiwill

maybe i should raise a new issue?

Jul 06 '23 06:07 heweiwill

Can you try with export POCL_DEVICES=cuda?

Jul 06 '23 18:07 isuruf

Can you try with export POCL_DEVICES=cuda?

still report same error Screenshot from 2023-07-07 09-48-31

Jul 07 '23 01:07 heweiwill

@isuruf Do you have any idea on this -33 error code ?

Jul 09 '23 07:07 heweiwill

According to this handy table -33 is CL_INVALID_DEVICE which would point towards something going wrong in device initialization. Try running with POCL_DEBUG=error,cuda and see if that spits out anything useful.

Jul 09 '23 14:07 jansol

According to this handy table -33 is CL_INVALID_DEVICE which would point towards something going wrong in device initialization. Try running with POCL_DEBUG=error,cuda and see if that spits out anything useful.

Hi @jansol Thanks very much for your reply. I add the debug environment variable, and got below log: Screenshot from 2023-07-10 11-08-55

It seems related to the hardware architecture, i am using the Jetson nano devkit, the spec could be found at https://developer.nvidia.com/embedded/jetson-nano-developer-kit

Jul 10 '23 03:07 heweiwill

Can you try setting export POCL_CUDA_GPU_ARCH=sm_53?

Jul 10 '23 14:07 isuruf

@isuruf Still failed at initialization, log info below: Screenshot from 2023-07-11 09-55-53

Jul 11 '23 01:07 heweiwill

A good sanity check is to install nvidia-cuda-toolkit and see if you can pickup the GPU with nvidia-smi. I've found the Nvidia drivers for aarch64 are a bit picky. Not that this will necessarily solve your issue, but I ended up re-installing my Nvidia drivers on an AVA devkit.

I'm looking forward to seeing your progress on this issue.

Jul 11 '23 09:07 onfe

I guess the following works?

#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>

#define CUDA_CHECK(ans) { gpuAssert((ans), __FILE__, __LINE__); }
void gpuAssert(cudaError_t code, const char *file, int line)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      exit(code);
   }
}

int main()
{
  CUDA_CHECK(cuInit(0));
  CUdevice device;
  CUDA_CHECK(cuDeviceGet(&device, 0));
  CUcontext ctx;
  CUDA_CHECK(cuCtxCreate(&ctx, 0, device));
  CUDA_CHECK(cuCtxCreate(&ctx, 0, device));
  int sm_maj, sm_min;
  CUDA_CHECK(cuDeviceGetAttribute(&sm_maj, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, device));
  CUDA_CHECK(cuDeviceGetAttribute(&sm_min, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, device));
  printf("sm_%d%d\n", sm_maj, sm_min);
  CUDA_CHECK(cuCtxDestroy(ctx));
  return 0;
}

Jul 11 '23 15:07 isuruf

A good sanity check is to install nvidia-cuda-toolkit and see if you can pickup the GPU with nvidia-smi. I've found the Nvidia drivers for aarch64 are a bit picky. Not that this will necessarily solve your issue, but I ended up re-installing my Nvidia drivers on an AVA devkit.

I'm looking forward to seeing your progress on this issue.

Thanks Edward for the suggestion. It's weird i cannot install nvidia toolkit at my jetson nano. I will try to figure it out later.

Jul 12 '23 02:07 heweiwill

The NV provided device query code output below information: Screenshot from 2023-07-12 10-07-26

Jul 12 '23 02:07 heweiwill

From the log of this example code, the Jetson nano compute capability truely is sm53: Screenshot from 2023-07-12 11-16-00

Jul 12 '23 03:07 heweiwill

Hi @isuruf any other ideas?
Now it seems cuda able to report correct compute architecture, but POCL failed to read it and cannot pass initialization, even manually set it. Maybe any logic inside POCL could be optimized?

Jul 14 '23 06:07 heweiwill

Can you try https://github.com/pocl/pocl/issues/1206#issuecomment-1631029796 ?

Jul 14 '23 15:07 isuruf

Can you try #1206 (comment) ?

@isuruf Do you mean the code below you mentioned? I tried the example code SimpleTextureDrv, which include the same function you provided, and it does able to read the SM major and minor value via cuda api, and the value is 53, same as previous we manually set to the environment variable.

#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>

#define CUDA_CHECK(ans) { gpuAssert((ans), __FILE__, __LINE__); }
void gpuAssert(cudaError_t code, const char *file, int line)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      exit(code);
   }
}

int main()
{
  CUDA_CHECK(cuInit(0));
  CUdevice device;
  CUDA_CHECK(cuDeviceGet(&device, 0));
  CUcontext ctx;
  CUDA_CHECK(cuCtxCreate(&ctx, 0, device));
  CUDA_CHECK(cuCtxCreate(&ctx, 0, device));
  int sm_maj, sm_min;
  CUDA_CHECK(cuDeviceGetAttribute(&sm_maj, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, device));
  CUDA_CHECK(cuDeviceGetAttribute(&sm_min, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, device));
  printf("sm_%d%d\n", sm_maj, sm_min);
  CUDA_CHECK(cuCtxDestroy(ctx));
  return 0;
}

Jul 17 '23 03:07 heweiwill

Ah, you have CUDA 10.2. Can you update to CUDA 11.1+?

Jul 17 '23 11:07 isuruf