compute-runtime icon indicating copy to clipboard operation
compute-runtime copied to clipboard

CL/GL sharing support more texture formats

Open smunaut opened this issue 2 years ago • 23 comments

The support implemented in #166 has some limitations on the supported texture format. I have been discussing in that (now closed) issue the possibility to support more formats but I thought it'd be better to open a new issue rather than keep talking in a closed one.

I've so far removed the quick check that limit the "create memory" to the currently supported texture format and what I noticed is that the call then hangs if you use unsupported textures.

I dug a bit into that and noticed two things :

  • The hang I notice is probably some bug in the error path. _eglDestroyImageCommon is only called if eglExportDMABUFImageMESA failed. So it should definitely not hang and it should return the error code instead of hanging but it's also not the root of my problem, ideally that call should work.
  • eglExportDMABUFImageMESA fails because dri2_can_export_dma_buf_image returns FALSE. And that's the root of the issue.
  • And the above eventually falls to missing formats in dri2_format_table ... but I have no idea if I can just add format to that table or what it implies or whatever ...

smunaut avatar Aug 03 '23 15:08 smunaut

Ok, so digging even further :

  • The hang is a bug in Mesa (missing mutex release in an error path)
  • The limitation of image format seems to mostly stem from DRI image format and so should be dealt with in mesa first ... and then maybe some support would also needed here as a second step but mesa is definitely the first step.

smunaut avatar Aug 03 '23 19:08 smunaut

Wow. great work! Today I'll jump in and follow your path. In general (afaik) Mesa/EGL is currently in heavy development. Maybe reaching out to Mesa might be a good idea.

kallaballa avatar Aug 04 '23 04:08 kallaballa

I'll use GL_RGBA32F for starters with my code just to see if it breaks the same way. But we need a common example to have a chance. Do you have one?

kallaballa avatar Aug 04 '23 05:08 kallaballa

Proposed fix for the locking issue btw : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24487/diffs?commit_id=fe844bae481b4f2369ab78537d2a800817973389

As for example, I use the following code. It doesn't really do anything except just trying to create the texture / cl mem object and nothing else ATM.

main.c.txt

smunaut avatar Aug 04 '23 08:08 smunaut

Thx!

kallaballa avatar Aug 04 '23 20:08 kallaballa

I think i found the problem. __DRI_IMAGE_FORMAT_NONE collides with an actual format. commenting those 2 lines fixes that: https://github.com/Mesa3D/mesa/blob/957009978ef6d7121fc0d710d03bc20097d4d46b/src/gallium/frontends/dri/dri_helpers.c#L679-L680

Now the cl_mem is succesfully mapped.

kallaballa avatar Aug 05 '23 07:08 kallaballa

Anyway that doesn't make a complete patch. I'll create an issue at mesa to maybe get an opinion from someone with the bigger picture.

kallaballa avatar Aug 05 '23 07:08 kallaballa

Not sure what you mean with __DRI_IMAGE_FORMAT_NONE colliding. I don't see any other image format with the same code.

Here's the flow I traced :

  • dri2_can_export_dma_buf_image reports the image can't be exported because it doesn't have an attribute __DRI_IMAGE_ATTRIB_FOURCC

  • That attribute comes either from image->dri_fourcc directly or from the result of dri2_get_mapping_by_format(image->dri_format)

  • The image is created by dri2_create_image_khr_texture This in turn will cann dri2_create_from_texture to initialize the image properties
    AFAICT this will not set image->dri_fourcc but it will set image->dri_format And it's set from mapping the gl texture format as such : img->dri_format = driGLFormatToImageFormat(obj->Image[face][level]->TexFormat);

  • That method driGLFormatToImageFormat uses a format_mapping table to match.

  • And that table just doesn't have any float32 types in it defined at all. And looking at the list of available __DRI_IMAGE_FORMAT_xxxx there are none that would match.

So I guess we'd need to introduce new DRI image formats as the first step.

smunaut avatar Aug 05 '23 10:08 smunaut

Apparently going through DRI images and the export dma buf api is the wrong way to do things for interop according to DRI devs I talked to ...

MesaGLInteropEGLExportObject seems to be the way to implement it AFAICT. It also seems to return a dmabuf but directly from a GL object handle without any intermediate EGLImage step.

smunaut avatar Aug 07 '23 09:08 smunaut

Hmmm. interesting! Gonna be back at it in a couple of days

kallaballa avatar Aug 07 '23 15:08 kallaballa

Implemented a quick PoC replacing eglExportDMABUFImageMESA with MesaGLInteropEGLExportObject and it's doing something :sweat_smile:

And it almost looks right : https://i.imgur.com/SmtjQm2.jpg vs https://i.imgur.com/KPFMzAs.jpg

smunaut avatar Aug 08 '23 07:08 smunaut

very excellent! sorry i can't help right now. sick in bed

kallaballa avatar Aug 08 '23 09:08 kallaballa

No worries, take care !

FWIW I pushed my current PoC https://github.com/smunaut/compute-runtime/tree/clgl

Note that it depends on some mesa patches https://gitlab.freedesktop.org/246tnt/mesa/-/commits/clgl-export

Those symbols are normally used internally. They can be accessed just using dlsym() looking for them at the global scope but not in all cases. If you have GLVND (vendor neutral dispatcher) and EGL, then that combination doesn't work. So after talking in the #dri-devel channel, the best option seemed to be to expose those function through the classic GetProcAddress mechanism and that's what those mesa patches do ...

smunaut avatar Aug 08 '23 11:08 smunaut

Found the bug above btw and it's unrelated to the change.

The extension is calling glGetTexLevelParameteriv to collect data about the texture. but that assumes that the texture is currently the one being bound in the GL state which is not necessarily the case ...

smunaut avatar Aug 08 '23 12:08 smunaut

No worries, take care !

FWIW I pushed my current PoC https://github.com/smunaut/compute-runtime/tree/clgl

That looks really promising!

Note that it depends on some mesa patches https://gitlab.freedesktop.org/246tnt/mesa/-/commits/clgl-export

Those symbols are normally used internally. They can be accessed just using dlsym() looking for them at the global scope but not in all cases. If you have GLVND (vendor neutral dispatcher) and EGL, then that combination doesn't work. So after talking in the #dri-devel channel, the best option seemed to be to expose those function through the classic GetProcAddress mechanism and that's what those mesa patches do ...

kallaballa avatar Aug 09 '23 03:08 kallaballa

Found the bug above btw and it's unrelated to the change.

The extension is calling glGetTexLevelParameteriv to collect data about the texture. but that assumes that the texture is currently the one being bound in the GL state which is not necessarily the case ...

I have trouble following that last part. Could you point out where? I'd like to fix that if you haven't already.

kallaballa avatar Aug 09 '23 03:08 kallaballa

Note that it depends on some mesa patches https://gitlab.freedesktop.org/246tnt/mesa/-/commits/clgl-export

Any idea when this is planned to be merged? I am just asking because the branch has a over 600 commits :)

kallaballa avatar Aug 09 '23 03:08 kallaballa

The issue is there : https://github.com/intel/compute-runtime/blob/master/opencl/source/sharings/gl/linux/gl_texture_linux.cpp#L57

This can't really call any GL functions because it should be independent of the current GL bind state and modifying it (like binding the texture, querying, restoring) is also very dodgy (I think GL could be doing other stuff in other threads).

The internal format is not an issue, it's returned by the interop API so we can get it from there, but I haven't found any way to get the texture size ...

I've been comparing this code with the windows side of the code, and in that one, the allocation is not grabbed the same way and doesn't seem to need the size. I've been trying to make the linux side more like the windows one but without success, and it might come from the different implementation of the underlying memory manager. I'd love an intel engineer to chime in and explain why the linux side was written like that in the first place. the 'reusesharedhandle' param is also different vs the win side and I'm not sure why either. Same with the "closeSharedHandle" which don't see to be on the win side either.

I've been looking at ROCm a bit too and in there the amd mesa driver actually returns some extra info through the gl interop call AFAICT (there is provision in that interop method to return "vendor" specific data), so we could also have the iris driver return more info from the gl side directly.

The mesa branch only has 3 commits (the 3 latest). It might appear like it has more because it's based off the 23.1.5 tag since I wanted to install system-wide locally and my system has 23.1.5 installed.

I'm not sure when this can be merged or even if this will be merged in that state, the functions might end up with other names. I just opened a draft MR asking for feedback : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24573

smunaut avatar Aug 09 '23 07:08 smunaut

maybe @JacekDanecki could give us a little nodge in the right direction? :)

kallaballa avatar Aug 09 '23 08:08 kallaballa

Ok, I pushed a new version of my branch that doesn't have those issues anymore. I also did a lot of cleanup removing dead/buggy code that wouldn't work on linux anyway pending the reimplementation for linux ( things for GL_BUFFER sharing and synchronization primitives).

This also makes things a whole lot less dependent on EGL which is nice to make it compatible with GLX.

You need my mesa branch to run it obviously.

smunaut avatar Aug 10 '23 09:08 smunaut

You need my mesa branch to run it obviously.

It has been merged?

kallaballa avatar Sep 02 '23 11:09 kallaballa

Most of it yes, but the v2 of the interop interface hasn't been merged yet. It's not a patch I wrote, it's part of another MR that's being reviewed now, there was actually quite a bit of review/ack going on yesterday so I have good hope that this will me merged very soon.

In the mean time, you still need to use my 23.2-resolve branch ( https://gitlab.freedesktop.org/246tnt/mesa/-/tree/23.2-resolve?ref_type=heads ).

smunaut avatar Sep 02 '23 12:09 smunaut

Thx for the info and your awesome work :)

kallaballa avatar Sep 02 '23 14:09 kallaballa