ILGPU icon indicating copy to clipboard operation
ILGPU copied to clipboard

[QUESTION]: <title>How can I restore the program after a cuda kernel error?

Open delverOne25 opened this issue 1 year ago • 1 comments

Question

private static void test(Index1D d, ArrayView view) { view[-d ] = 3/d; /// error } public static unsafe void Main(string[] args) { var ctx = Context.Create (c=>c.AllAccelerators().EnableAlgorithms().Optimize(OptimizationLevel.O2).Inlining(InliningMode.Aggressive)); var a = ctx.CreateCudaAccelerator(0); var ttt = a.LoadAutoGroupedKernel<Index1D, ArrayView>(test); var ccc = a.Allocate1D(10); try { ttt(a.DefaultStream, 1000, ccc.View); a.DefaultStream.Synchronize(); } catch (AcceleratorException e) { CudaAPI.CurrentAPI.DestroyContext((a as CudaAccelerator).NativePtr); a = ctx.CreateCudaAccelerator(0); /// ILGPU.Runtime.Cuda.CudaException: "an illegal memory access was encountered"

        a.Dispose();
        return;
    }

Environment

  • ILGPU version: [e.g., 1.5.1]
  • .NET version: [e.g., .NET 8]
  • Operating system: [e.g., Windows 10]
  • Hardware (if GPU-related): [e.g., NVIDIA GeForce GTX 1080]

Additional context

No response

delverOne25 avatar Aug 12 '24 13:08 delverOne25

I don't think it's possible. To be short, cuda device doesn't support any form of exception handling. If an error happened on the device side, there's no way to recover from it. The only solution is to avoid the error from happening before you execute it on the device.

hez2010 avatar Oct 16 '24 11:10 hez2010