compute-runtime Need cl_khr_gl

Hi

I found that this driver doesn't support cl_khr_gl_sharing, is there any plan for this ? Thanks !

May 09 '19 06:05 zhoub

At this point this extension is supported only on Windows. There are currently no plans to implement this extension on Linux.

May 09 '19 11:05 PiotrRozenfeld

Hi

This is actually important as Windows platform. So many video applications require the OpenCL and OpenGL interop, and now our work is based on Intel Up board. Please consider this. Thanks !

May 10 '19 02:05 zhoub

Number of platforms                               3
  Platform Name                                   Intel(R) OpenCL HD Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 1.2 
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing 
  Platform Extensions function suffix             INTEL

  Platform Name                                   Intel Gen OCL Driver
  Platform Vendor                                 Intel
  Platform Version                                OpenCL 1.2 beignet 1.3 (git-5aba95a)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing
  Platform Extensions function suffix             Intel

  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 18.0.5
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 NEO 
  Driver Version                                  19.14.12751
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               18
  Max clock frequency                             750MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              32
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 1 / 1       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    32, Little-Endian
  Global memory size                              3435970560 (3.2GiB)
  Error Correction support                        No
  Max memory allocation                           1717985280 (1.6GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        131072
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            107374080 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max constant buffer size                        1717985280 (1.6GiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      52ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    SPIR versions                                 1.2 
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
  Motion Estimation accelerator version	(Intel)   2
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing 

  Platform Name                                   Intel Gen OCL Driver
Number of devices                                 1
  Device Name                                     Intel(R) HD Graphics Broxton 0
  Device Vendor                                   Intel
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 beignet 1.3 (git-5aba95a)
  Driver Version                                  1.3
  Device OpenCL C Version                         OpenCL C 1.2 beignet 1.3 (git-5aba95a)
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               18
  Max clock frequency                             1000MHz
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None, None, None
  Max work item dimensions                        3
  Max work item sizes                             512x512x512
  Max work group size                             512
  Preferred work group size multiple              16
  Preferred / native vector sizes                 
    char                                                16 / 8       
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               0 / 2        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              4102029312 (3.82GiB)
  Error Correction support                        No
  Max memory allocation                           3076521984 (2.865GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        8192
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4096 bytes
    Pitch alignment for 2D image buffers          1 bytes
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             8192x8192x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max constant buffer size                        134217728 (128MiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      80ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing cl_khr_fp16

  Platform Name                                   Clover
Number of devices                                 0

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel(R) OpenCL HD Graphics
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [INTEL]
  clCreateContext(NULL, ...) [default]            Success [INTEL]
  clCreateContext(NULL, ...) [other]              Success [Intel]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Gen9 HD Graphics NEO
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Gen9 HD Graphics NEO

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.8
  ICD loader Profile                              OpenCL 1.2
	NOTE:	your OpenCL library declares to support OpenCL 1.2,
		but it seems to support up to OpenCL 2.1 too.

In contrast, very funny.

The device Intel(R) Gen9 HD Graphics NEO has all video-related extensions but no OpenGL sharing, another device Intel(R) HD Graphics Broxton 0 has all graphics extensions but no VAAPI sharing. It means either 1 or 2, but no possible to use OpenGL to visualize the thing from 1, really sad.

May 21 '19 11:05 zhoub

Id really like to see the CL GL context sharing aswell :+1: .

Jul 20 '19 11:07 juliusmh

same here, I'm really disappointed that I can't port my application to linux with intel cpus because of this. cl_khr_gl_sharing seems like such a key extension to have I'm surprised you're not supporting it.

Jul 30 '19 10:07 lilly-lizard

I also tried using the beignet implementation and while it does support the cl_khr_gl_sharing extension, one of the functions I required, cl_mem_new_gl_buffer was empty but for a line of code: FATAL ("Not implemented") tears all round.

Jul 30 '19 10:07 lilly-lizard

Your arguments are convincing. We will fit this effort into our development schedule for the remainder of this year.

Aug 02 '19 09:08 PiotrRozenfeld

wow that's great to hear! your hard work is very much appreciated. best of luck to your team!

Aug 02 '19 13:08 lilly-lizard

Hi. Yesterday my app seized working as it used cl_khr_gl_sharing with the beignet driver on linux and that seems to be disfunctional with my most recent update now. For my compositions it is absolutely crucial to work, so I'm quite devastated, as it is more or less impossible to implement my system on other Platforms/OSes (see: https://www.youtube.com/watch?v=2rWha1HTfFE&t=360s). I'd be extremely grateful if you'd implemented it and am absolutely willing to support funding if you let me know how!

Oct 17 '19 14:10 ormf

@PiotrRozenfeld

We will fit this effort into our development schedule for the remainder of this year.

Are there any news about whether work has been done on this and whether/when it will be implemented?

Dec 22 '19 19:12 ormf

This is being actively worked on. We don't have a clear trend date, but Q1 is very likely.

Dec 23 '19 07:12 AdamCetnerowski

This is being actively worked on. We don't have a clear trend date, but Q1 is very likely.

@AdamCetnerowski: Thanks, very much appreciated!

Jan 16 '20 12:01 ormf

Has there been any progress on this issue?

Mar 17 '20 12:03 pioto1225

I would also very much like to see this feature implemented

Mar 17 '20 12:03 smistad

Some foundational work has been done to enable the extension. At the this time the Q1 trend is no longer valid. We intend to provide a partial implementation early Q2 for evaluation. We’re looking forward to your feedback when it’s available.

Mar 19 '20 12:03 AdamCetnerowski

We’re looking forward to your feedback when it’s available.

Most definitely! Right now I'm holding back any upgrades on my system as I have to keep llvm 8.0.1 for the beignet drivers.

May 20 '20 09:05 ormf

We have made progress towards this feature, but it was slower than initially expected. We’ll update this issue when we have something that is user-testable.”

Jul 10 '20 13:07 AdamCetnerowski

Hi. Any news on when this will be released? Could use it for my thesis. Otherwise I need to find a workaround ^^

Sep 13 '20 12:09 Apahdos

While we did lay some groundwork for cl_khr_gl_sharing with MESA on Linux, we also had to increase our initial effort estimates, as additional work to be done in MESA and Compute before we can proceed with implementation. This does not mean we abandon the effort, but we can no longer accommodate that in the near future.

We'll post updates when the work will resume.

Sep 14 '20 12:09 PiotrRozenfeld

@PiotrRozenfeld thanks a lot for the info! Although this is very bad news for me, I now know that I'll either have to get beignet compiled with llvm 10, or resort to installing a dual boot until this is resolved. I sincerely hope this is not the final blow to this in compute-runtime.

Sep 14 '20 13:09 ormf

Probably not most useful comment (I'm not a developer, just user whole likes to test something new) but I was looking at AMD's ROCM sources and found there

https://github.com/ROCm-Developer-Tools/ROCclr/blob/master/device/rocm/mesa_glinterop.h

/* Mesa OpenGL inter-driver interoperability interface designed for but not
 * limited to OpenCL.

Of course this is only smallest part of whole work .

Oct 04 '20 03:10 Randrianasulu

@PiotrRozenfeld any ETA of having a cl_khr_gl_sharing solution going? This extension missing causes a major problem since most of our linux machines are using Intel integrated GPUs.

Jan 19 '21 18:01 mathucub

While we did lay some groundwork for cl_khr_gl_sharing with MESA on Linux, we also had to increase our initial effort estimates, as additional work to be done in MESA and Compute before we can proceed with implementation. This does not mean we abandon the effort, but we can no longer accommodate that in the near future.

We'll post updates when the work will resume.

Any news on that ? This extension is really a must have for many applications.

May 20 '21 06:05 chamois94

Also curious how this is going? We are building and deploying Intel NUC based video acquisition/processing solutions and the lack of this extension is the only reason we're still on Beignet drivers

Jun 09 '21 19:06 smlehbleh

The work on this extension has stalled due to other work that our team is facing, unfortunately.

@smlehbleh - could you share a sample workload that you are looking to enable? What is your platform of interest? You are indicating this already works on Beignet - implying no additional work in MESA is needed.

Jun 28 '21 08:06 PiotrRozenfeld

Hi Piotr, Our use case is an Intel NUC powered cinema camera/recording device. The RAW image processing (debayering and colour transformation etc...) is performed in OpenCL kernels and the user interface is rendered with OpenGL ES. We use Beignet OpenCL drivers which support the cl_khr_gl_sharing extension to share the processed RAW image from an OpenCL buffer with OpenGL ES to draw it to the monitor screen. We are using Ubuntu 18.04 with an Intel NUC8v7PNB and (soon) an NUC11TNBv7. The cl_khr_gl_sharing extension implementation in Beignet has been present for quite a while (Maybe a couple of years...). For an experiment we tried the NEO CL driver and replaced our 'cl_khr_gl_sharing' GL/CL interop code with a manual additional copy from a CL buffer into a GL texture - the total CPU usage appeared to go from about 10% to 15-20% and a noticeable frame delay was introduced. The buffer was a UHD RGBA32 image (~33mb).

Jun 28 '21 09:06 smlehbleh

Hi @PiotrRozenfeld , I was just wondering if Beignet's existing cl_khr_gl_sharing support/implementation could be ported or copied over into this driver - or if it could generally reduce the amount of work as you mentioned?

Aug 20 '21 08:08 smlehbleh

As I can see in Beignet documentation cl_khr_gl_sharing is partially supported, and allow to create memory objects from OpenGL buffers or 2D textures. If I understand your use case correctly you're using cl/gl sharing differently, and you share OpenCL buffers with OpenGL, so memory for buffers is allocated by OpenCL not by OpenGL. Could you clarify?

Sep 01 '21 12:09 JacekDanecki

Hi Jacek, We are using the Beignet implementation of cl_khr_gl_sharing in the standard intended use case - we create an OpenGL texture and then create an OpenCL buffer from it using 'clCreateFromGLTexture'. I realise in the previous comment it seems like we're directly sharing an OpenCL buffer with OpenGL, but our currently active implementation is an OpenCL buffer created from an OpenGL texture (clCreateFromGLTexture). This allows us to draw the GL texture to the screen with OpenGL after an OpenCL kernel writes to the underlying buffer without any additional 'copy'.

Sep 01 '21 14:09 smlehbleh

Hi Jacek,

Am Mittwoch, den 01. September 2021 um 05:26:00 Uhr (-0700) schrieb Jacek Danecki:

As I can see in Beignet documentation cl_khr_gl_sharing is partially supported, and allow to create memory objects from OpenGL buffers or 2D textures.

If I understand your use case correctly you're using cl/gl sharing differently, and you share OpenCL buffers with OpenGL, so memory for buffers is allocated by OpenCL not by OpenGL. Could you clarify?

it's unclear whether you address me in your mail.

The way I used it with the Beignet driver was to first allocate the buffers in OpenGL (using "gen-buffer") and then "recreating" them in OpenCL (within the opengl context with the allocated OpenGL buffer mapped) by calling the OpenCL routine "create-buffer" with the USE-HOST-PTR flag.

Is that what you were asking?

Best, Orm

Sep 01 '21 14:09 ormf

compute-runtime
compute-runtime copied to clipboard

Need cl_khr_gl_sharing

compute-runtime compute-runtime copied to clipboard

Need cl_khr_gl_sharing

compute-runtime
compute-runtime copied to clipboard