ILGPU icon indicating copy to clipboard operation
ILGPU copied to clipboard

[BUG]: Version 1.5.1 cannot adapt to 5070 graphics card and reports an exception

Open xxl-cc opened this issue 10 months ago • 3 comments

Describe the bug

Executing SimpleKernel project reports exception ILGPU.Runtime. CUDA.CudaException: 'a PTX JIT compilation failed'

Environment

  • ILGPU version: [e.g., 1.5.1]
  • .NET version: [e.g., .NET 8]
  • Operating system: [e.g., Windows 11]
  • Hardware (if GPU-related): [e.g., NVIDIA GeForce GTX 5070]

Steps to reproduce

static void MyKernel( Index1D index, // The global thread index (1D in this case) ArrayView dataView, // A view to a chunk of memory (1D in this case) int constant) // A sample uniform constant { dataView[index] = index + constant; } var kernel = accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView, int>(MyKernel);

Expected behavior

Normal execution

Additional context

No response

xxl-cc avatar Jun 06 '25 13:06 xxl-cc

Using version 1.5.2 and 5070 graphics card to report 'too many resources requested for launch' . It can run normally with version 1.5.1 and 4070 graphics card

class Program { internal struct CustomDataType { public int First; public ArrayView<Float3> coordinate_dev; } static void MyKernel( Index1D index, CustomDataType dataView) { //data[index] = dataView1.First; } static void Main() { using var context = Context.CreateDefault();

        foreach (var device in context)
        {
            // Create accelerator for the given device
            using var accelerator = device.CreateAccelerator(context);
            Console.WriteLine($"Performing operations on {accelerator}");

            var param = new CustomDataType();

            var kernel = accelerator.LoadAutoGroupedStreamKernel<Index1D, CustomDataType>(MyKernel); 
            kernel(1, param); 
            accelerator.Synchronize();
        }
    } 
}

xxl-cc avatar Jun 06 '25 13:06 xxl-cc

We are seeing cases with RTX 5090 Laptop, RTX 5070 Laptop and RTX 5060 Ti. Due to #1323 we can not use 1.5.2 to get more information.

afmg-aherzog avatar Jun 11 '25 12:06 afmg-aherzog

@MoFtZ I asked you about this problem in discord. I think it would be better to continue here.

I will reproduce your extensive explanation here:

Unfortunately, ILGPU normally stops working when Nvidia releases a new GPU series.

In theory, it should all be compatible, and trivial to add support.

ILGPU generates PTX code, which is passed to the Cuda driver to convert into GPU machine code. PTX is supposed to allow compatibility with new devices, as per Cuda documentation.

In practice, the new GPU series introduces a new Instruction Set (ISA), which must be supplied to the Cuda driver for it to work. Unfortunately, ILGPU can query the SM, but the ISA cannot be queried from the device at runtime. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes

The ILGPU build pipeline has an automated task to read the PTX Release Notes, parse the information, and update the internal ILGPU tables. This then requires a new version of ILGPU to be deployed on Nuget.

Can we get 1.5.3 out of the door to fix this problem? I would be happy to help if I can.

afmg-aherzog avatar Jun 17 '25 11:06 afmg-aherzog

@afmg-aherzog thanks a lot for mentioning this. We discussed the issue offline and are preparing a patch. Test pipelines are ready and we expect a new release supporting 50xx GPUs very soon!

m4rs-mt avatar Jun 19 '25 20:06 m4rs-mt

This issue has been fixed in v1.5.3.

m4rs-mt avatar Jul 12 '25 17:07 m4rs-mt