llvm [SYCL] Avoid infinite loop when kernel fails to compile with memory error

The runtime tries to build a kernel again if compilation fails. But if UR returns a memory error the attempt counter was not compared against the maximum number of attempts, so the compiler was continuously called and eventually the loop counter would have overflowed.

Jun 10 '25 12:06 VerenaBeckham

@VerenaBeckham is it possible to add a test that would fail without your patch and work after your patch?

Jun 10 '25 14:06 maarquitos14

@VerenaBeckham is it possible to add a test that would fail without your patch and work after your patch?

I'm not sure. Even if you could write a test that allocates so much memory that it runs out, this test might still pass on a different architecture with more memory. Or do you know of a way to mock up this error?

Jun 11 '25 10:06 VerenaBeckham

@VerenaBeckham is it possible to add a test that would fail without your patch and work after your patch?

I'm not sure. Even if you could write a test that allocates so much memory that it runs out, this test might still pass on a different architecture with more memory. Or do you know of a way to mock up this error?

I think a unit test with UrMock should do the trick for that: https://github.com/intel/llvm/blob/b437083ea7b827ea6798e3fcfae1fe0f926d1d6d/sycl/doc/developer/ContributeToDPCPP.md#dpc-headers-and-runtime-tests

You can find examples if you grep UrMock within sycl/unittests. Basically, UrMock allows you to redefine the behavior of any UR function for a given test, so you could redefine the function that is triggering the memory error to always return a memory error without even trying to allocate.

Jun 11 '25 13:06 maarquitos14

@VerenaBeckham is it possible to add a test that would fail without your patch and work after your patch?

I'm not sure. Even if you could write a test that allocates so much memory that it runs out, this test might still pass on a different architecture with more memory. Or do you know of a way to mock up this error?

I think a unit test with UrMock should do the trick for that: https://github.com/intel/llvm/blob/b437083ea7b827ea6798e3fcfae1fe0f926d1d6d/sycl/doc/developer/ContributeToDPCPP.md#dpc-headers-and-runtime-tests

You can find examples if you grep UrMock within sycl/unittests. Basically, UrMock allows you to redefine the behavior of any UR function for a given test, so you could redefine the function that is triggering the memory error to always return a memory error without even trying to allocate.

Thanks for the hint! I have added a test.

Jun 11 '25 17:06 VerenaBeckham

If you're happy with this @maarquitos14 do you want to "approve the workflows" and kick off the testing?

Jun 19 '25 09:06 VerenaBeckham

I'm sorry, I forgot to push my clang format fixes. 🤦 Fixed now.

Jun 19 '25 09:06 VerenaBeckham