llvm
llvm copied to clipboard
[SYCL] Add invalid kernel handling
This commit adds support for handling invalid kernels.
Sometimes kernel cannot be generated for specific platform, when trying to append launch of such faulty kernel exception is now thrown - informing why used kernel is invalid.
Signed-off-by: Krystian Chmielewski [email protected]
BTW. It would be nice to have a test.
This commit adds support for handling invalid kernels.
Sometimes kernel cannot be generated for specific platform, when trying to append launch of such faulty kernel exception is now thrown - informing why used kernel is invalid.
Signed-off-by: Krystian Chmielewski [email protected]
What is the motivating issue? Are you going to add an E2E test too?
@smaslov-intel "What is the motivating issue? Are you going to add an E2E test too?"
Kernel's compiled from spirv using fp64 on platforms without fp64 support are poisoned (code is replaced) with printf. This is done to enable building module with fp64 kernels on platforms without fp64 support. This change aims to catch this and return appropriate error to user, when encountering poisoned kernel.
Kernel's compiled from spirv using fp64 on platforms without fp64 support are poisoned (code is replaced) with printf. This is done to enable building module with fp64 kernels on platforms without fp64 support.
Why does a platform need the support fp64 to just build kernel, not to run it?
This change aims to catch this and return appropriate error to user, when encountering poisoned kernel.
Are you saying that the regexp you are trying to match is something that is printed with that printf? If so, it looks like a hack to me. Isn't there a more robust solution to the issue?
Why does a platform need the support fp64 to just build kernel, not to run it?
That's how intel-graphics-compiler works. The problem is if user tries to run this kernel. It would be much better to get an error instead of executing said kernel and getting information from result of kernel's printf.
Are you saying that the regexp you are trying to match is something that is printed with that printf? If so, it looks like a hack to me. Isn't there a more robust solution to the issue?
No. Compiler appends attribute to the kernel "invalid_kernel(reason)". Regexp that I am matching is one of kernel's attributes
That's how intel-graphics-compiler works. The problem is if user tries to run this kernel. It would be much better to get an error instead of executing said kernel and getting information from result of kernel's printf.
This is for AOT compilation, right? Wouldn't it be much more friendly to give an error at AOT compilation-time (and request JIT compilation) rather than deferring to run-time? Mis-compiling the kernel (which printf is essentially doing) and then trying to understand/report this at run-time looks weird at least.
Compiler appends attribute to the kernel "invalid_kernel(reason)". Regexp that I am matching is one of kernel's attributes
Is it possible for user to insert "attribute" in source code and then have a false positive diagnostic in case of a coincidental regexp match?
Wouldn't it be much more friendly to give an error at AOT compilation-time (and request JIT compilation) rather than deferring to run-time?
This is a must to allow same build DL frameworks for PVC/ATS-M. The requirements for it to work are:
- Build time: no failures should be reported. - do not give error during AOT compilation
- Run time: execution on pvc/ats-m should report error in case application uses fp64. In AOT case 'assert' should be injected. This is done by passing "invalid_kernel" attribute.
Is it possible for user to insert "attribute" in source code and then have a false positive diagnostic in case of a coincidental regexp match?
It's impossible to have false positive. Regexp only matches attribute "invalid_kernel(reason)"
Wouldn't it be much more friendly to give an error at AOT compilation-time (and request JIT compilation) rather than deferring to run-time?
This is a must to allow same build DL frameworks for PVC/ATS-M. The requirements for it to work are:
- Build time: no failures should be reported. - do not give error during AOT compilation
- Run time: execution on pvc/ats-m should report error in case application uses fp64. In AOT case 'assert' should be injected. This is done by passing "invalid_kernel" attribute.
What is the benefit of pretending we have the right kernel at build-time (where we already know it is not going to work) and report an error at run-time?
Is it possible for user to insert "attribute" in source code and then have a false positive diagnostic in case of a coincidental regexp match?
It's impossible to have false positive. Regexp only matches attribute "invalid_kernel(reason)"
What if something else (user) injects the matching attribute?
What is the benefit of pretending we have the right kernel at build-time (where we already know it is not going to work) and report an error at run-time?
We can use single spirv with multiple kernels (including those using fp64) and compile it to native binary for every device.
What if something else (user) injects the matching attribute?
If user injects "invalid_kernel(reason)" attribute then PI_ERROR_PLUGIN_SPECIFIC_ERROR error with message description "invalid_kernel(reason)" will be returned as expected.
We can use single spirv with multiple kernels (including those using fp64) and compile it to native binary for every device.
But you said just earlier that you can't compile fp64 code correctly if host devices doesn't support fp64. So, what is the purpose of not giving an error in compile time if the fp64 kernel is miscompiled? Maybe, you have a design doc explaining the approach in details?
If user injects "invalid_kernel(reason)" attribute then PI_ERROR_PLUGIN_SPECIFIC_ERROR error with message description "invalid_kernel(reason)" will be returned as expected.
Why is this "as expected"? I think users can expect completely something different, and may not be happy with RT reporting error in this case. There is no compiler reserved attributes that app can't use, are there?
Maybe, you have a design doc explaining the approach in details?
Please look into jira XDEPS-4268.
Why is this "as expected"? I think users can expect completely something different, and may not be happy with RT reporting error in this case. There is no compiler reserved attributes that app can't use, are there?
If user adds attribute "reqd_work_group_size(X,Y,Z)" then it's expected for work group size to be (X, Y, Z) (any other work group size on setGroupSize call will result in an error). If user adds attribute "invalid_kernel(reason)" then it's expected to return an error on appendLaunchKernel
If user adds attribute "reqd_work_group_size(X,Y,Z)" then it's expected for work group size to be (X, Y, Z) (any other work group size on setGroupSize call will result in an error). If user adds attribute "invalid_kernel(reason)" then it's expected to return an error on appendLaunchKernel
Where is this behavior documented so that users know that?
If user adds attribute "reqd_work_group_size(X,Y,Z)" then it's expected for work group size to be (X, Y, Z) (any other work group size on setGroupSize call will result in an error). If user adds attribute "invalid_kernel(reason)" then it's expected to return an error on appendLaunchKernel
Where is this behavior documented so that users know that?
@KrystianChmielewski : can you help understand this?
@smaslov-intel where should it be documented?