Summary

oneMKL can be difficult to build and use. The reasons for this are:

The build documentation is fragmented and at times outdated
The compiler ecosystem and compatibility is a little confusing

Problem statement

The current build documentation has the following problems:

It suggests oneMKL supports Conan. Conan is no longer supported since https://github.com/oneapi-src/oneMKL/discussions/267
- Consequently, you have scroll some way to find the CMake instructions.
In most cases, building oneMKL backends is a case of setting -DENABLE_XYZ_BACKEND=ON. This isn't clearly communicated by the documentation, that makes building look far more complicated.
The docs suggests setting TARGET_DOMAINS even thought it is set automatically.
At times the documentation is worded in a way that suggests that backends cannot be enabled simultaneously, even thought I think they can.
In most cases, building a backend with hipSYCL (which is now AdaptiveCpp) is similar to building with clang++ or icpx, but we have duplication of information instead.

We also support a range of compilers.

I believe that some backends build with ICPX and Codeplay's plugins for Nvidia and AMD GPUs, although this is undocumented. We should consider supporting and documenting these.
- Codeplay's oneAPI for Nvidia GPUs, Codeplay's oneAPI for AMD GPUs
In cases where a backend is dependent on a particular compiler, we should consider trying to generate useful CMake errors.

Preferred solution

Consider rewriting building_the_project.rst to address the above problems.
- Some aspects (e.g. additional options for building with AdaptiveCpp) could potentially move to new files.
Generate more errors in CMake to reflect compiler support for different backends.

Mar 13 '24 16:03 hjabird

Hi @hjabird We are looking for community support for this project. Are you (or any others) willing to pitch in and help improve the documentation?

Mar 20 '24 22:03 cdgarland

Hi Craig, yes we will try to help with some of these issues when time permits. It would be useful to have some feedback before we start spending time on it.

To continue the discussion from https://github.com/oneapi-src/oneMKL/pull/458#discussion_r1519544088 I was thinking it could make sense to document some configurations that are expected to work but not regularly tested. This could help users try some configurations and report issues. I think the combination of all the domains, platforms, compilers and OS lead to a number of configurations too high to be regularly tested. I'm curious to have @mmeterel's opinion on this.

Mar 22 '24 11:03 Rbiessy

Hi Craig, yes we will try to help with some of these issues when time permits. It would be useful to have some feedback before we start spending time on it.

To continue the discussion from #458 (comment) I was thinking it could make sense to document some configurations that are expected to work but not regularly tested. This could help users try some configurations and report issues. I think the combination of all the domains, platforms, compilers and OS lead to a number of configurations too high to be regularly tested. I'm curious to have @mmeterel's opinion on this.

@Rbiessy I support adding documentation for untested configurations as long as how to use them are clearly documented and reproducible by internal and external developers. Main concern with not testing is, as the repo keeps evolving, how will we ensure these untested configurations will continue working? What do you think for such cases? Another point about documenting which configs are tested and which are not. I support being transparent to users by providing this information (like our AMDs backend are not tested in CI yet), but we should make plans to close the gaps in CI in a timely manner. (For example adding AMD backends in CI is in progress)

Mar 22 '24 18:03 mmeterel

I don't have a good solution to ensure that untested configurations will continue working. I would say that if people are trying to use them we will eventually get an issue if something is broken. That is my understanding of a community-driven project.

Regarding the CI I have some doubts about how much details we should share. I believe that Alexey (@toxicscum) is working on closing the gap but I doubt everything in the README can be tested.

Mar 25 '24 16:03 Rbiessy

I don't have a good solution to ensure that untested configurations will continue working. I would say that if people are trying to use them we will eventually get an issue if something is broken. That is my understanding of a community-driven project.

Regarding the CI I have some doubts about how much details we should share. I believe that Alexey (@toxicscum) is working on closing the gap but I doubt everything in the README can be tested.

Relying on user input when/if something is broken degrades the quality of the project/repo imho. One option would be, guaranteeing all the mentioned configurations work in every release. Then, at least we have a time stamp where the user can get a stable repo.

Regarding details about CI: When we mention about a configuration and say it is not tested regularly, we are already sharing this information. Ideally, in a repo like this, all CI should be public IMHO.

Mar 25 '24 16:03 mmeterel

So we've had internal discussions about the release of oneMKL. We are planning to create releases of oneMKL interface and ship them with the oneAPI base toolkit. These will be tested by the oneAPI core team at Codeplay using icpx and the Codeplay Nvidia and AMD plugins. As I far as I know this is the only recommended way for users to get DPC++ running on all hardware. I am planning to start a separate discussion on the oneMKL interface release eventually.

I hope it makes more sense why we think we should document some configurations that are expected to work but may not be regularly tested yet.

Mar 25 '24 17:03 Rbiessy

So we've had internal discussions about the release of oneMKL. We are planning to create releases of oneMKL interface and ship them with the oneAPI base toolkit. These will be tested by the oneAPI core team at Codeplay using icpx and the Codeplay Nvidia and AMD plugins. As I far as I know this is the only recommended way for users to get DPC++ running on all hardware. I am planning to start a separate discussion on the oneMKL interface release eventually.

I hope it makes more sense why we think we should document some configurations that are expected to work but may not be regularly tested yet.

This is news to me (releasing oneMKL interfaces as part of base tool kit) Is this in replacement of the releases done in oneMKL interfaces repo or in addition to it? Tagging Irina and Maria for their information and inputs. @mkrainiuk @itopinsk

Mar 25 '24 17:03 mmeterel

I would prefer to keep the discussion on releases separate. I will try to start the discussion this week.

Mar 25 '24 18:03 Rbiessy

I agree that there should be better documentation for building and running tests in oneMKL.

It also seems that there isn't support (as far as I can tell) for building with non standard ROCm installation. If the build process was better documented, the documentation effort could potentially also act as a survey on what works and what doesn't. This would be valuable IMO

Mar 26 '24 10:03 hdelan

As part of this, the documentation for consuming oneMKL with CMake could also be improved. The documentation does not detail CMake targets. This was bought up in https://github.com/oneapi-src/oneMKL/issues/501.

May 28 '24 12:05 hjabird

Hi Team @Rbiessy @mmeterel @hjabird @mkrainiuk @hdelan, is there timeline in place for the same as was expecting we could meet Q2 timeline of June 28th for https://github.com/uxlfoundation/open-source-working-group/issues/38

Jun 10 '24 15:06 vbm23

HI @vbm23, there should be a PR in a few hours! Working branch here.

Jun 10 '24 15:06 hjabird

@hjabird Since #510 would partially help close this issue, would the other one be #501? @Rbiessy Feel free to comment

Jun 11 '24 15:06 vbm23

@vbm23 I think the other thing we can do is generate useful errors.

https://github.com/oneapi-src/oneMKL/pull/490 resolved errors that you'd get from HIP_PATH being wrong.
#510 improves the build documentation
- I also added documentation on consuming oneMKL with CMake as part of this PR.
Finally we can do more to generate more useful errors. For instance, using AdaptiveCpp with unsupported backends gives you compiler errors for features that not included in AdaptiveCpp. Ideally we'd generate an error along the lines of "AdaptiveCpp is not supported for this backend" instead.

I think we're getting there, but not quite there yet!

Jun 11 '24 15:06 hjabird

[Build][Docs] Improve build documentation

Summary

Problem statement

Preferred solution