dpctl icon indicating copy to clipboard operation
dpctl copied to clipboard

Support compilation from SYCL source code

Open sommerlukas opened this issue 9 months ago • 22 comments

This adds support to create a program/executable kernel_bundle from SYCL source code.

It uses the DPC++ kernel_compiler extension. As this is only an extension and not all backends are supported, a query on the device was added to check for support for compilation.

  • [x] Have you provided a meaningful PR description?
  • [x] Have you added a test, reproducer or referred to an issue with a reproducer?
  • [x] Have you tested your changes locally for CPU and GPU devices?
  • [x] Have you made sure that new changes do not introduce compiler warnings?
  • [ ] Have you checked performance impact of proposed changes?
  • [x] Have you added documentation for your changes, if necessary?
  • [ ] Have you added your changes to the changelog?
  • [ ] If this PR is a work in progress, are you opening the PR as a draft?

sommerlukas avatar Apr 11 '25 10:04 sommerlukas

The kernel_compiler extension also supports SPIR-V and OpenCL as input languages, so it would be possible to replace the existing interop for SPIR-V and OpenCL with the same extension. I however did not do this as part of this PR for three reasons:

  • To keep the PR scope limited
  • The resulting programs wouldn't be interop anymore.
  • It should probably be part of a bigger refactor that also renames program to kernel_bundle then.

sommerlukas avatar Apr 11 '25 10:04 sommerlukas

Builds seem to be failing on the current compiler version, so this will probably need to be gated like the others behind utilities that check for the availability of the "kernel_compiler" extension

ndgrigorian avatar Apr 15 '25 05:04 ndgrigorian

On the plus side, it's passing with 2025.2, and OS builds work—tests won't run until rebased on/merged with master though

ndgrigorian avatar Apr 15 '25 05:04 ndgrigorian

SYCL runtime compilation should also be supported with 2025.1, but the registered_names property used to be called registered_kernel_names in that release. Also, 2025.0 might define the feature test macro, because it had support for OpenCL and SPIR-V in kernel_compiler (but no SYCL yet), but doesn't define all properties yet.

@ndgrigorian Do you know how to distinguish the different minor versions? I tried with __LIBSYCL_MAJOR_VERSION and __LIBSYCL_MINOR_VERSION, but I don't seem to get different versions.

sommerlukas avatar Apr 15 '25 14:04 sommerlukas

SYCL runtime compilation should also be supported with 2025.1, but the registered_names property used to be called registered_kernel_names in that release. Also, 2025.0 might define the feature test macro, because it had support for OpenCL and SPIR-V in kernel_compiler (but no SYCL yet), but doesn't define all properties yet.

@ndgrigorian Do you know how to distinguish the different minor versions? I tried with __LIBSYCL_MAJOR_VERSION and __LIBSYCL_MINOR_VERSION, but I don't seem to get different versions.

Compiler also defines __LIBSYCL_PATCH_VERSION , that's the only other way I would know immediately.

ndgrigorian avatar Apr 15 '25 18:04 ndgrigorian

I ended up using __SYCL_COMPILER_VERSION to distinguish the different DPC++ minor versions. Based on feedback in my other PR #2038, I've also switched to using opaque pointers here and removed std::vector and other STL types from the header.

sommerlukas avatar Apr 23 '25 16:04 sommerlukas

Coverage failure relates to two of the newly-added tests

[  FAILED  ] 2 tests, listed below:
[  FAILED  ] KernelBundleCreationFromSYCL/TestSYCLKernelBundleFromSource.CheckCreateFromSYCLSource/1, where GetParam() = "opencl:cpu"
[  FAILED  ] KernelBundleCreationFromSYCL/TestSYCLKernelBundleFromSource.CheckGetKernelSYCLSource/1, where GetParam() = "opencl:cpu"

ndgrigorian avatar Apr 24 '25 17:04 ndgrigorian

Coverage failure relates to two of the newly-added tests

[  FAILED  ] 2 tests, listed below:
[  FAILED  ] KernelBundleCreationFromSYCL/TestSYCLKernelBundleFromSource.CheckCreateFromSYCLSource/1, where GetParam() = "opencl:cpu"
[  FAILED  ] KernelBundleCreationFromSYCL/TestSYCLKernelBundleFromSource.CheckGetKernelSYCLSource/1, where GetParam() = "opencl:cpu"

Which version of the Intel OpenCL for CPU runtime is that setup using?

Locally, I had segfaults with these tests in libintelocl.so, in the step where the compiler reads in SPIR-V. Upgrading my Intel OpenCL for CPU runtime from 2023.x to 2025.19.x fixed the issue, so this may be a case of an older OpenCL CPU runtime not being able to correctly ingest the SPIR-V.

If you have control over the setup, we could try updating the driver. Alternatively, we could deactivate the tests (and RTC support) for OpenCL CPU for now.

sommerlukas avatar Apr 30 '25 14:04 sommerlukas

Which version of the Intel OpenCL for CPU runtime is that setup using?

This is probably the problem. It installs manually from a pretty old version, I'll go ahead and bump it in a separate PR and if that doesn't introduce any problems, we can rebase this again and see if it resolves it.

ndgrigorian avatar Apr 30 '25 18:04 ndgrigorian

@sommerlukas master now has change to OS workflow, we can see if picking up the change fixes failures

ndgrigorian avatar May 01 '25 20:05 ndgrigorian

@sommerlukas seems Linux builds are still building on 2025.0.4, which is actually why it's passing

failures in others (on 2025.1.1) are all the same:

/home/runner/work/dpctl/dpctl/libsyclinterface/source/dpctl_sycl_kernel_bundle_interface.cpp:854:45: error: no type named 'registered_names' in namespace 'sycl::ext::oneapi::experimental'
  854 | using registered_names_property_t = syclex::registered_names;
      |                                     ~~~~~~~~^
/home/runner/work/dpctl/dpctl/libsyclinterface/source/dpctl_sycl_kernel_bundle_interface.cpp:908:9: error: unknown type name 'registered_names_property_t'
  908 |         registered_names_property_t RegisteredNames;
      |         ^

ndgrigorian avatar May 06 '25 07:05 ndgrigorian

@sommerlukas seems Linux builds are still building on 2025.0.4, which is actually why it's passing

failures in others (on 2025.1.1) are all the same:

/home/runner/work/dpctl/dpctl/libsyclinterface/source/dpctl_sycl_kernel_bundle_interface.cpp:854:45: error: no type named 'registered_names' in namespace 'sycl::ext::oneapi::experimental'
  854 | using registered_names_property_t = syclex::registered_names;
      |                                     ~~~~~~~~^
/home/runner/work/dpctl/dpctl/libsyclinterface/source/dpctl_sycl_kernel_bundle_interface.cpp:908:9: error: unknown type name 'registered_names_property_t'
  908 |         registered_names_property_t RegisteredNames;
      |         ^

I used the version reported by 2025.1(.0), which is the release date, to distinguish between versions <= 2025.1, which define registered_kernel_names and >= 2025.2, which defines registered_names instead. The patch release 2025.1.1 probably still defines the old registered_kernel_names, but ofc has a version/release date higher than 2025.1.0. This is either a question of increasing the version number of the distinction or, if there is some overlap in release dates between 2025.2 and 2025.1.1, of finding an alternative approach to determining which property is defined by that release. I'll look into it.

sommerlukas avatar May 06 '25 07:05 sommerlukas

@ndgrigorian I fixed another issue that caused the coverage testing to fail in the previous CI run. On 2025.1, querying template kernels by template name is not yet supported, they can only be queried by mangled name. I've adapted the test to account for that. Coverage testing is running correctly for me locally, but the CI workflows need approval.

I'm not quite sure what's causing the Jenkins runs to fail.

sommerlukas avatar May 13 '25 07:05 sommerlukas

@ndgrigorian I fixed another issue that caused the coverage testing to fail in the previous CI run. On 2025.1, querying template kernels by template name is not yet supported, they can only be queried by mangled name. I've adapted the test to account for that. Coverage testing is running correctly for me locally, but the CI workflows need approval.

I'm not quite sure what's causing the Jenkins runs to fail.

Started them up.

And internal tests cite test_create_program_from_sycl_source as the source, with dpctl.program._program.SyclProgramCompilationError. I'll check the builds as well tomorrow.

ndgrigorian avatar May 13 '25 07:05 ndgrigorian

Coverage Status

coverage: 85.866% (-0.02%) from 85.89% when pulling de9866128cdacf96c4d21e04f0a0a8beb4548be4 on sommerlukas:sycl-source-compilation into e2789db9a1763cf5a23b4fb2484777cb20415d72 on IntelPython:master.

coveralls avatar May 13 '25 08:05 coveralls

Coverage Status

coverage: 85.335% (-1.0%) from 86.372% when pulling 153ec69 on sommerlukas:sycl-source-compilation into ae1e532 on IntelPython:master.

I think the coverage went down, because the Python test that I added runs on LevelZero and is skipped in the coverage testing. Does the coverage machine have LevelZero?

sommerlukas avatar May 13 '25 15:05 sommerlukas

I think the coverage went down, because the Python test that I added runs on LevelZero and is skipped in the coverage testing. Does the coverage machine have LevelZero?

You're correct, none of our runners have LevelZero. I don't mind the coverage going down in these cases, because there's really not much we can do about it.

ndgrigorian avatar May 14 '25 09:05 ndgrigorian

I think the coverage went down, because the Python test that I added runs on LevelZero and is skipped in the coverage testing. Does the coverage machine have LevelZero?

You're correct, none of our runners have LevelZero. I don't mind the coverage going down in these cases, because there's really not much we can do about it.

Does the runner have OpenCL? We could have the test run on OpenCL instead.

sommerlukas avatar May 14 '25 09:05 sommerlukas

Does the runner have OpenCL? We could have the test run on OpenCL instead.

Yes, tests all run on opencl:cpu, so that would be fine.

ndgrigorian avatar May 14 '25 17:05 ndgrigorian

I've changed the test to run on opencl now.

I've also added support for inspecting the build log in case compilation fails through the message in the SyclProgramCompilationError reported back.

sommerlukas avatar May 21 '25 16:05 sommerlukas

@ndgrigorian Is the Jenkins failure still related to the tests or just flaky? The rest of the CI is green now.

sommerlukas avatar May 26 '25 07:05 sommerlukas

@ndgrigorian Is the Jenkins failure still related to the tests or just flaky? The rest of the CI is green now.

The build failures are unrelated, but the test failures still seem to be related. On Windows there's a stack overflow in at least one of the Jenkins runners, and on Linux there's Device linking failed

ndgrigorian avatar May 27 '25 00:05 ndgrigorian

Superseded by gh-2206

ndgrigorian avatar Dec 03 '25 19:12 ndgrigorian