llvm [SYCL] Add invalid kernel handling

This commit adds support for handling invalid kernels.

Sometimes kernel cannot be generated for specific platform, when trying to append launch of such faulty kernel exception is now thrown - informing why used kernel is invalid.

Signed-off-by: Krystian Chmielewski [email protected]

Aug 29 '22 16:08 KrystianChmielewski

BTW. It would be nice to have a test.

Aug 30 '22 07:08 romanovvlad

This commit adds support for handling invalid kernels.

Sometimes kernel cannot be generated for specific platform, when trying to append launch of such faulty kernel exception is now thrown - informing why used kernel is invalid.

Signed-off-by: Krystian Chmielewski [email protected]

What is the motivating issue? Are you going to add an E2E test too?

Aug 30 '22 10:08 smaslov-intel

@smaslov-intel "What is the motivating issue? Are you going to add an E2E test too?"

Kernel's compiled from spirv using fp64 on platforms without fp64 support are poisoned (code is replaced) with printf. This is done to enable building module with fp64 kernels on platforms without fp64 support. This change aims to catch this and return appropriate error to user, when encountering poisoned kernel.

Aug 30 '22 11:08 KrystianChmielewski

Kernel's compiled from spirv using fp64 on platforms without fp64 support are poisoned (code is replaced) with printf. This is done to enable building module with fp64 kernels on platforms without fp64 support.

Why does a platform need the support fp64 to just build kernel, not to run it?

This change aims to catch this and return appropriate error to user, when encountering poisoned kernel.

Are you saying that the regexp you are trying to match is something that is printed with that printf? If so, it looks like a hack to me. Isn't there a more robust solution to the issue?

Aug 30 '22 11:08 smaslov-intel

Why does a platform need the support fp64 to just build kernel, not to run it?

That's how intel-graphics-compiler works. The problem is if user tries to run this kernel. It would be much better to get an error instead of executing said kernel and getting information from result of kernel's printf.

Are you saying that the regexp you are trying to match is something that is printed with that printf? If so, it looks like a hack to me. Isn't there a more robust solution to the issue?

No. Compiler appends attribute to the kernel "invalid_kernel(reason)". Regexp that I am matching is one of kernel's attributes

Aug 30 '22 11:08 KrystianChmielewski

That's how intel-graphics-compiler works. The problem is if user tries to run this kernel. It would be much better to get an error instead of executing said kernel and getting information from result of kernel's printf.

This is for AOT compilation, right? Wouldn't it be much more friendly to give an error at AOT compilation-time (and request JIT compilation) rather than deferring to run-time? Mis-compiling the kernel (which printf is essentially doing) and then trying to understand/report this at run-time looks weird at least.

Compiler appends attribute to the kernel "invalid_kernel(reason)". Regexp that I am matching is one of kernel's attributes

Is it possible for user to insert "attribute" in source code and then have a false positive diagnostic in case of a coincidental regexp match?

Aug 31 '22 13:08 smaslov-intel

Wouldn't it be much more friendly to give an error at AOT compilation-time (and request JIT compilation) rather than deferring to run-time?

This is a must to allow same build DL frameworks for PVC/ATS-M. The requirements for it to work are:

Build time: no failures should be reported. - do not give error during AOT compilation
Run time: execution on pvc/ats-m should report error in case application uses fp64. In AOT case 'assert' should be injected. This is done by passing "invalid_kernel" attribute.

Is it possible for user to insert "attribute" in source code and then have a false positive diagnostic in case of a coincidental regexp match?

It's impossible to have false positive. Regexp only matches attribute "invalid_kernel(reason)"

Aug 31 '22 14:08 KrystianChmielewski

Wouldn't it be much more friendly to give an error at AOT compilation-time (and request JIT compilation) rather than deferring to run-time?

This is a must to allow same build DL frameworks for PVC/ATS-M. The requirements for it to work are:

Build time: no failures should be reported. - do not give error during AOT compilation

Run time: execution on pvc/ats-m should report error in case application uses fp64. In AOT case 'assert' should be injected. This is done by passing "invalid_kernel" attribute.

What is the benefit of pretending we have the right kernel at build-time (where we already know it is not going to work) and report an error at run-time?

Is it possible for user to insert "attribute" in source code and then have a false positive diagnostic in case of a coincidental regexp match?

It's impossible to have false positive. Regexp only matches attribute "invalid_kernel(reason)"

What if something else (user) injects the matching attribute?

Aug 31 '22 15:08 smaslov-intel

What is the benefit of pretending we have the right kernel at build-time (where we already know it is not going to work) and report an error at run-time?

We can use single spirv with multiple kernels (including those using fp64) and compile it to native binary for every device.

What if something else (user) injects the matching attribute?

If user injects "invalid_kernel(reason)" attribute then PI_ERROR_PLUGIN_SPECIFIC_ERROR error with message description "invalid_kernel(reason)" will be returned as expected.

Sep 01 '22 08:09 KrystianChmielewski

We can use single spirv with multiple kernels (including those using fp64) and compile it to native binary for every device.

But you said just earlier that you can't compile fp64 code correctly if host devices doesn't support fp64. So, what is the purpose of not giving an error in compile time if the fp64 kernel is miscompiled? Maybe, you have a design doc explaining the approach in details?

If user injects "invalid_kernel(reason)" attribute then PI_ERROR_PLUGIN_SPECIFIC_ERROR error with message description "invalid_kernel(reason)" will be returned as expected.

Why is this "as expected"? I think users can expect completely something different, and may not be happy with RT reporting error in this case. There is no compiler reserved attributes that app can't use, are there?

Sep 01 '22 13:09 smaslov-intel

Maybe, you have a design doc explaining the approach in details?

Please look into jira XDEPS-4268.

Why is this "as expected"? I think users can expect completely something different, and may not be happy with RT reporting error in this case. There is no compiler reserved attributes that app can't use, are there?

If user adds attribute "reqd_work_group_size(X,Y,Z)" then it's expected for work group size to be (X, Y, Z) (any other work group size on setGroupSize call will result in an error). If user adds attribute "invalid_kernel(reason)" then it's expected to return an error on appendLaunchKernel

Sep 02 '22 08:09 KrystianChmielewski

If user adds attribute "reqd_work_group_size(X,Y,Z)" then it's expected for work group size to be (X, Y, Z) (any other work group size on setGroupSize call will result in an error). If user adds attribute "invalid_kernel(reason)" then it's expected to return an error on appendLaunchKernel

Where is this behavior documented so that users know that?

Sep 14 '22 16:09 smaslov-intel

If user adds attribute "reqd_work_group_size(X,Y,Z)" then it's expected for work group size to be (X, Y, Z) (any other work group size on setGroupSize call will result in an error). If user adds attribute "invalid_kernel(reason)" then it's expected to return an error on appendLaunchKernel

Where is this behavior documented so that users know that?

@KrystianChmielewski : can you help understand this?

Sep 27 '22 19:09 smaslov-intel

@smaslov-intel where should it be documented?

Sep 28 '22 08:09 KrystianChmielewski

llvm llvm copied to clipboard

[SYCL] Add invalid kernel handling

llvm
llvm copied to clipboard