ILGPU icon indicating copy to clipboard operation
ILGPU copied to clipboard

[BUG]: Infinite compilation with local arrays inside kernels!

Open delverOne25 opened this issue 5 months ago • 1 comments

Describe the bug

When defining local arrays int[] arr = new int[1_000_000], it creates PTX code that is 1,000,000 lines long with compilation time exceeding 10 minutes and consuming over 10 GB of RAM. The output PTX code is megabytes in size, containing a million lines that initialize all array elements to zero, instead of using a loop for initialization!!!

public static void kernel0(Index1D index,ArrayView1D<int,Stride1D.Dense> arr ) { int n = _n_; int[] t = new int[n]; t[0] = arr[index]; for (int i = 0; i < n; i++) { arr[index] += t[i]; } } //var ptxKernel=(PTXCompiledKernel)k.GetCompiledKernel(); //Console.WriteLine(ptxKernel.Name); //Console.WriteLine(ptxKernel.Info); //Console.WriteLine(ptxKernel.PTXAssembly); ........ ........ n lines

    add.u64 %rd9971, %rd4, %rd9972;
    st.local.b32    [%rd9971], 0;
    mul.wide.u32    %rd9973, 9968, 4;
    add.u64 %rd9972, %rd4, %rd9973;
    st.local.b32    [%rd9972], 0;
    mul.wide.u32    %rd9974, 9969, 4;
    add.u64 %rd9973, %rd4, %rd9974;
    st.local.b32    [%rd9973], 0;
    mul.wide.u32    %rd9975, 9970, 4;
    add.u64 %rd9974, %rd4, %rd9975;
    st.local.b32    [%rd9974], 0;
    mul.wide.u32    %rd9976, 9971, 4;
    add.u64 %rd9975, %rd4, %rd9976;
    st.local.b32    [%rd9975], 0;
    mul.wide.u32    %rd9977, 9972, 4;
    add.u64 %rd9976, %rd4, %rd9977;
    st.local.b32    [%rd9976], 0;
    mul.wide.u32    %rd9978, 9973, 4;
    add.u64 %rd9977, %rd4, %rd9978;
    st.local.b32    [%rd9977], 0;
    mul.wide.u32    %rd9979, 9974, 4;
    add.u64 %rd9978, %rd4, %rd9979;
    st.local.b32    [%rd9978], 0;

....... .......

Environment

  • ILGPU version: [e.g., 1.5.1]
  • .NET version: [e.g., .NET 8]
  • Operating system: [e.g., Windows 10]
  • Hardware (if GPU-related): [e.g., NVIDIA GeForce GTX 1080]

Steps to reproduce

111

Expected behavior

111

Additional context

No response

delverOne25 avatar Oct 24 '25 02:10 delverOne25

hi @delverOne25.

Local arrays within a kernel are not well supported - particularly if you are trying to create 1_000_000 elements. This would apply to ILGPU, and also native Cuda.

GPU programming is different from CPU programming, and you need to deal with memory allocation different.

MoFtZ avatar Nov 12 '25 11:11 MoFtZ