ILGPU [BUG]: Implicit conversion of float argument to double in XMath.Pow(double, double) causes illegal memory access

Describe the bug

Please view this failure case; see marker at site of faulting statement;

namespace SimpleKernel;
public class CsGpu 
{
	public static void Main(string[] args)
	{
		const int BufferSize = 1024;
		using var context = Context.Create(x => x.Default().EnableAlgorithms());

		foreach (var device in context /*.GetCPUDevices()*/)
		{
			using var accelerator = device.CreateAccelerator(context);
			Console.WriteLine($"Performing operations on {accelerator}");

			var kernel = accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<float>, ArrayView<float>, ArrayView<float>, SpecializedValue<int>>(MyKernel);
			using var buffer1 = accelerator.Allocate1D<float>(BufferSize + 1); // Padding for [i + 1] in TrackingError
			using var buffer2 = accelerator.Allocate1D<float>(BufferSize + 1);
			using var buffer3 = accelerator.Allocate1D<float>(BufferSize + 1);

			kernel((int)buffer1.Length, buffer1.View, buffer2.View, buffer3.View, SpecializedValue.New(BufferSize));

			accelerator.Synchronize();

			var result = buffer3.GetAsArray1D();
			Console.WriteLine($"Result: {string.Join(", ", result)}");
		}
	}

	/*
	Diagnostic kernel to repro memory fault converting float to double in XMath routine */
	static void MyKernel(
				Index1D index,
				ArrayView<float> f1ArrayView,
				ArrayView<float> f2ArrayView,
				ArrayView<float> f3ArrayView,
				SpecializedValue<int> len)
	{
		f3ArrayView[index] = DiffSeriesSD(len, f1ArrayView, f2ArrayView);
	}

	private static float DiffSeriesSD(SpecializedValue<int> len, ArrayView<float> levels, ArrayView<float> bundle)
	{
		float diffAvg = 0;
		ArrayView<float> diffs = LocalMemory.Allocate<float>(len);
		diffs[0] = 0;
		bundle[0] = 2;
		levels[1] = 3;
		for (int i = 1; i < len; i++)
		{
			bundle[i] = i;
			levels[i + 1] = i + 1f;
			float f1 = (bundle[i] - bundle[i - 1]) / bundle[i - 1];
			float f2 = (levels[i + 1] - levels[i]) / levels[i];
			diffs[i] = (float)(f1 - f2);
			diffAvg += diffs[i];
		}
		diffAvg /= len;

		float sumSquareDiff = 0;
		for (int i = 0; i < len; i++)
		{
			sumSquareDiff += (float)XMath.Pow(diffs[i] - diffAvg, 2.0);  // <--- SITE OF ERROR; 2.0f EXPONENT SUCCEEDS
		}

		// Return the s.d.
		return XMath.Sqrt(sumSquareDiff / len);
	}
}

Fault indication:

Performing operations on CPUAccelerator [Type: CPU, WarpSize: 4, MaxNumThreadsPerGroup: 16, MemorySize: 9223372036854775807]
Result: 0.017593753, 0.017593753, 0.017593753, 0.017593753, 0.017593753, 0.017593753, 0.017593753, 0.017593753, ...0.017593753, 0.017593753, 0.017593753
Performing operations on Quadro K620 [Type: Cuda, WarpSize: 32, MaxNumThreadsPerGroup: 1024, MemorySize: 2147352576]
Unhandled exception. ILGPU.Runtime.Cuda.CudaException: an illegal memory access was encountered
   at ILGPU.Runtime.Cuda.CudaAccelerator.SynchronizeInternal()
   at ILGPU.Runtime.Accelerator.Synchronize()
   at SimpleKernel.CsGpu.Main(String[] args) in L:\home\timwood\cs-gpu\CsGpu.cs:line 38

L:\home\timwood\cs-gpu\bin\Debug\net8.0\cs-gpu.exe (process 35696) exited with code -532462766 (0xe0434352).

Environment

ILGPU version: 1.5.2
ILGPU.Algorithms version: 1.5.2
.NET version: .NET 8
Operating system: Windows 10
Hardware (if GPU-related): NVIDIA Quadro K620 (Compute Capability 5.0)

Steps to reproduce

Capture the code given in the description in a .cs file, compile and execute. Configure dependencies as in Environment.

Expected behavior

Expected: The program should print an array of 1024 non-zero float values for each accelerator visible to ILGPU. Actual: The program prints a valid array on CPU accelerator; fails with a stack trace on GPU accelerator.

Additional context

No response

May 22 '25 21:05 timwood0

@m4rs-mt I have managed to reduce the problem, but still do not understand the issue.

Changing the local memory allocation lower than 255 elements will prevent the issue from occurring. Changing the loop to less than 33 will also prevent the issue from occurring. Removing the loop, or commenting out the unused setting of localValues, will also prevent the issue from occurring. Removing the recursive all from Calculate will prevent the issue from occurring. Changing Calculate to use float will prevent the issue from occurring.

using ILGPU;
using ILGPU.Runtime;
using ILGPU.Runtime.Cuda;
using System;

namespace SimpleKernel
{
    public static class Program
    {
        public static void Main(string[] args)
        {
            using var context = Context.Create(x => x.Cuda());

            foreach (var device in context)
            {
                using var accelerator = device.CreateAccelerator(context);
                Console.WriteLine($"Performing operations on {accelerator}");

                var kernel = accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<float>>(MyKernel);
                using var outputBuffer = accelerator.Allocate1D<float>(1);

                kernel((int)outputBuffer.Length, outputBuffer.View);
                accelerator.Synchronize();
            }
        }

        static void MyKernel(Index1D index, ArrayView<float> outputView)
        {
            ArrayView<float> localValues = LocalMemory.Allocate<float>(255);
            for (int i = 0; i < 33; i++)
            {
                localValues[i] = 123.45f;
            }

            outputView[index] = (float)Calculate(1.0);
        }

        static double Calculate(double value)
        {
            if (value < 0)
                return Calculate(1.0);

            return 1.0;
        }
    }
}

May 26 '25 12:05 MoFtZ

Fixed by https://github.com/m4rs-mt/ILGPU/pull/1334

Jul 08 '25 10:07 MoFtZ