compute-runtime Improve Linux CL/GL sharing support

This series of patch is aimed at drastically improving the support for the CL/GL sharing extension on linux. The current support is not really usable as it only supports a few texture format, and only on EGL. It is also pretty buggy since it requires the texture to be bound when placing the CL call to share it which is just plain wrong and will not work in many applications. It will also

Fixes #659 Fixes #667

The first few patches just comment out and remove a lot of dead code. It was all copied from the windows implementation and will never work "as-is". Worse some of it ends up calling nullptr function pointers when something is not supported causing crashes. And it makes things pretty hard to read so I just removed it.

Then I replace the current implementation that uses an EGL-specific way of getting a DMA-BUF handle for "some" textures with the "official" interop extension from MESA. Also add support for sharing buffer and not just texture using the same extension. Proper support for GLX and EGL is also added with dispatching to the correct one depending on the params given when creating the context.

And then another round of cleanup patches to remove part of the code that are no longer used once that EGL specific thing has been replaced.

This is still far from being a fully compliant / full featured version of the extension, but it's a big step forward in my opinion and allows to run some real applications. I've tested gr-fosphor (SDR spectrum display) and Davinci Resolve as examples. Both of theses don't work without theses improvements.

There could also be bugs / things done wrong, I won't hide that some of the code related to the GMM lib and such ... I barely understand what it does, I mostly based it off the current code and the windows equivalent, but someone with a good understanding should probably re-read this.

Note that this currently marked as draft because it depends on some mesa patches which are not merged yet. In particular the version 2 of the interop extension that is being used is still pending. It's most likely going to be merged in the coming days though and should be part of MESA 23.3

I'm opening the pull request now to gather feedback so that hopefully it's in shape for merge by the time mesa is ready.

Aug 25 '23 08:08 smunaut

I have tested this on my machine, and the app I wanted to work (DaVinci Resolve) - worked flawlessly on UHD620!

Aug 25 '23 10:08 k1gen

So any chance someone from intel can review / comment or just general advise on what is required to get this merged ?

Sep 06 '23 13:09 smunaut

Thanks for the review, I'll try to address these next week.

Sep 07 '23 15:09 smunaut

@JablonskiMateusz Ok, so it took me a while longer to finally get to it, but I have addressed the comments above, rebased over master (and also tested only the 23.30 release) and seems to all be fine.

Oct 02 '23 15:10 smunaut

@JablonskiMateusz Responding here for the libGL thing because for some reason, github doesn't me put a reply to that specific comment :shrug:

(1) Because for whatever reason the user is free to have loaded (through whatever mechanism) any GL library and possibly not the first one that would be returned by a normal libdl lookup of libGL.so.0. Then there is also the EGL/GLX thing where we have no idea a-priori which one the user will want to use and we don't want to force loading the wrong one in the same address space, so it's easier to lookup what the user chose by looking at already available symbols.

(3) In the initGLFunctions it's not an issue because this is called when creating the CL context but at that point the user must have given a reference to the current GL context in the parameter list which means that GL must have been loaded already.

In the isEnabled case (to decide wether we show the extension or not), it's less clear. All the functionality of the extensions depend on a GL context being present first so to me it make sense to advertise it only if GL is loaded. But we could just as well just return true all the time and leave it up to the user to actually make sure it gets a valid GL context in time before trying to actually make use of it.

Oct 09 '23 08:10 smunaut

@JablonskiMateusz Responding here for the libGL thing because for some reason, github doesn't me put a reply to that specific comment 🤷

(1) Because for whatever reason the user is free to have loaded (through whatever mechanism) any GL library and possibly not the first one that would be returned by a normal libdl lookup of libGL.so.0. Then there is also the EGL/GLX thing where we have no idea a-priori which one the user will want to use and we don't want to force loading the wrong one in the same address space, so it's easier to lookup what the user chose by looking at already available symbols.

(3) In the initGLFunctions it's not an issue because this is called when creating the CL context but at that point the user must have given a reference to the current GL context in the parameter list which means that GL must have been loaded already.

In the isEnabled case (to decide wether we show the extension or not), it's less clear. All the functionality of the extensions depend on a GL context being present first so to me it make sense to advertise it only if GL is loaded. But we could just as well just return true all the time and leave it up to the user to actually make sure it gets a valid GL context in time before trying to actually make use of it.

We cannot assume that GL is already loaded. Where does that requirement come from? We need to try to open gl library in order to expose the extension - that's our baseline. If GL library was already loaded then we will just increase ref count instead of loading or reloading it. It will be used in the same form as was already loaded.

Oct 11 '23 08:10 JablonskiMateusz

But we don't know which GL library ... it's not our choice.

It's be 100% valid for a user to do dlopen("/path/to/some/custom/libGL.so.1234", RTLD_GLOBAL); to load a GL library that's not the system default one ... So we can't hard code any names or path anywhere.

If you don't want to assume GL is already loaded to return the extension, then I would return it all the time. Which from reading the extension spec is perfectly valid since all that's needed is that there is some way/combination of GL context/device for which we could support it.

Oct 11 '23 08:10 smunaut

Btw. @smunaut have you looked (e.g. latrace'd) what Nvidia and AMD OpenCL drivers do?

Oct 11 '23 08:10 eero-t

@eero-t

AMD looksup symbols in the global scope without loading anything it self, you can see it here :

https://github.com/ROCm-Developer-Tools/clr/blob/5914ac3c6e9b3848023a7fa25e19e560b1c38541/rocclr/device/rocm/rocglinterop.cpp#L71C38-L71C69

(Now, here they lookup MesaGLInteropGLXExportObject directly which is "the old way" of doing it because until recently that was the only way since glxGetProcAddress wouldn't support that symbol but it was shown that this direct method didn't work when using EGL + GLVND so mesa added those interop symbols to the normal GetProcAddress ... but it's the same idea, they search in the global scope what's already there and don't load anything themselves).

The AMD implementation always advertises the extension, it's statically present in the list of supported extension and it doesn't do any checks before it returns it AFAICT. ( https://github.com/ROCm-Developer-Tools/clr/blob/5914ac3c6e9b3848023a7fa25e19e560b1c38541/rocclr/device/device.hpp#L204 )

In Mesa's rusticl they also just search the global scope : https://gitlab.freedesktop.org/mesa/mesa/-/commit/02f7d3640021f23d54022c6fbdf1304c44672033#cb331a47f57adfd7e2565e461a25af9ec00ff8e6_0_32 They call dlsym with a NULL first argument to just lookup in what's already loaded rather than any form of explicit linking. However contrary to AMD's they do not return the extension if GL is not loaded already.

As for NVIDIA, I'm not sure, no sources I can look at ... I'd need to try it.

Oct 11 '23 09:10 smunaut

I think we may self load in this case. Extension can be exposed always but to work with sharing GL library needs to be loaded.

Oct 20 '23 06:10 JablonskiMateusz

Rebased to master
Removed the isGLSharingEnabled check and always advertise extension
Adapted to the final version of the mesa interop interface that is now part of git master on the mesa repo.

Nov 22 '23 10:11 smunaut

@smunaut please adjust commit msg to https://github.com/intel/compute-runtime/blob/master/CONTRIBUTING.md#commit-message-guidelines I think squash to single commit would be needed

Nov 23 '23 08:11 JablonskiMateusz

Is squashing really needed ? It makes the diff really hard to read and also means I can't put explanation of why a particular change was done that way in the commit message since it's all globbed up into a giant unreadable diff.

All the commits would belong in feature/ocl AFAICT.

Nov 23 '23 08:11 smunaut

it will be squashed in the end but we can do this on our side. Let's keep it here without squashing. However, I got some errors when applying the PR:

../../../opencl/source/sharings/gl/linux/gl_buffer_linux.cpp:225:51: error: no matching constructor for initialization of 'NEO::Gmm' graphicsAllocation->setDefaultGmm(new Gmm(helper, ^ ~~~~~~~ ../../../shared/source/gmm_helper/gmm.h:45:5: note: candidate constructor not viable: requires 7 arguments, but 8 were provided Gmm(GmmHelper *gmmHelper, const void *alignedPtr, size_t alignedSize, size_t alignment, ^ ../../../shared/source/gmm_helper/gmm.h:44:5: note: candidate constructor not viable: requires 4 arguments, but 8 were provided Gmm(GmmHelper *gmmHelper, ImageInfo &inputOutputImgInfo, const StorageInfo &storageInfo, bool preferCompressed); ^ ../../../shared/source/gmm_helper/gmm.h:48:5: note: candidate constructor not viable: requires 3 arguments, but 8 were provided Gmm(GmmHelper *gmmHelper, GMM_RESOURCE_INFO *inputGmm, bool openingHandle); ^ ../../../shared/source/gmm_helper/gmm.h:47:5: note: candidate constructor not viable: requires 2 arguments, but 8 were provided Gmm(GmmHelper *gmmHelper, GMM_RESOURCE_INFO *inputGmm); ^ ../../../shared/source/gmm_helper/gmm.h:40:7: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 8 were provided class Gmm { ^ ../../../shared/source/gmm_helper/gmm.h:43:5: note: candidate constructor not viable: requires 0 arguments, but 8 were provided Gmm() = delete; ^ 1 error generated.

Also, when fixed the issue locally I got test failures:

97x TEST_F(GlSharingTextureTests, givenMockGlWhen2dGlTextureIsCreatedThenMemObjectHasGlHandler) { 98x cl_int retVal = CL_INVALID_VALUE; 99x 100x glSharing->uploadDataToTextureInfo(textureId); 101x 102t> auto glTexture = GlTexture::createSharedGlTexture(clContext.get(), (cl_mem_flags)0, GL_TEXTURE_2D, 0, textureId, &retVal); 103x ASSERT_NE(nullptr, glTexture);

46x texIn.version = 2; 47x texIn.target = getBaseTargetType(target); 48x texIn.obj = texture; 49x texIn.miplevel = miplevel; 50x 51x switch (flags) { 52x case CL_MEM_READ_ONLY: 53x texIn.access = MESA_GLINTEROP_ACCESS_READ_ONLY; 54x break; 55x case CL_MEM_WRITE_ONLY: 56x texIn.access = MESA_GLINTEROP_ACCESS_WRITE_ONLY; 57x break; 58x case CL_MEM_READ_WRITE: 59x texIn.access = MESA_GLINTEROP_ACCESS_READ_WRITE; 60x break; 61x default: 62t> errorCode.set(CL_INVALID_VALUE); 63x return nullptr; 64x }

Could you please ensure that this code is working after rebase?

Nov 23 '23 15:11 JablonskiMateusz

I'm doing what I can to test but again, I'm trying to rebase to master and build here but I just can't ... master just doesn't build and it ends with ocloc segfaulting currently, so there isn't much testing I can do.

I can build 23.48.27912.11 so currently that's the most recent thing I could rebase and test onto. Both 23.52.28202.13 and master don't build for me.

Jan 24 '24 20:01 smunaut

Ok, I managed to build master on my machine (seems there is a requirement for some specific version of dependencies that are not checked by the build system).

So I rebased and checked it build. I added a couple of small fixes and removed non-applicable broken tests.

Running make run_adlp_0_ocl_tests passes without error. I can't run all the tests because some of them fail on my machine even without this patch and the testing stops at the first failure ... But this specific test target should check everything that is affected by this patch set.

I didn't change the commit message since if you squash on your side you'll have to make a new one anyway so at that point you can use the feature/ocl prefix for it at that time.

Jan 26 '24 09:01 smunaut

What were the specific dependency versions that build system did not check?

(I normally take latest tagged releases of GMMlib, SPIRV-*, vc-intrinsics, IGC, L0, compute-runtime, and build them with Ubuntu using system LLVM 14 packages: clang, libopencl-clang, llvm-spirv, libllvmspirvlib, and haven't had any problems. As I'm just driver user, not developer, I haven't tried building master though...)

Jan 26 '24 10:01 eero-t

I was trying to build both master and 23.52.28202.13 while having :

intel-graphics-compiler-1.0.15368.3
intel-vc-intrinsics-0.13.0

And cmake accepted those fine and it started building but it ended with ocloc segfaulting.

After I upgraded to :

intel-graphics-compiler-1.0.15770.7
dev-libs/intel-vc-intrinsics-0.16.0

then it worked and I was able to build.

Jan 26 '24 10:01 smunaut

Just rebased on top of master again. Checked that it builds, works in a coupld of applications, and that make run_adlp_0_ocl_tests works too.

Feb 16 '24 10:02 smunaut

:wave:

Feb 29 '24 23:02 smunaut

Hi @smunaut I'll try merge it but I need additional input from your side. Please update PR title and PR description to meet our commit msg restrictions https://github.com/intel/compute-runtime/blob/master/CONTRIBUTING.md#commit-message-guidelines

Mar 01 '24 07:03 JablonskiMateusz

@JablonskiMateusz I've updated the PR title and description to match those guidelines. I didn't update the commits themselves since AFAIR you said you would squash them all together and I assume the final commit title/description will be taken from the PR.

Mar 01 '24 08:03 smunaut

Merged outside of github.

Thanks @JablonskiMateusz

Mar 01 '24 18:03 smunaut

compute-runtime compute-runtime copied to clipboard

Improve Linux CL/GL sharing support

compute-runtime
compute-runtime copied to clipboard