compute-runtime
compute-runtime copied to clipboard
Need cl_khr_gl_sharing
Hi
I found that this driver doesn't support cl_khr_gl_sharing, is there any plan for this ? Thanks !
At this point this extension is supported only on Windows. There are currently no plans to implement this extension on Linux.
Hi
This is actually important as Windows platform. So many video applications require the OpenCL and OpenGL interop, and now our work is based on Intel Up board. Please consider this. Thanks !
Number of platforms 3
Platform Name Intel(R) OpenCL HD Graphics
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 1.2
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing
Platform Extensions function suffix INTEL
Platform Name Intel Gen OCL Driver
Platform Vendor Intel
Platform Version OpenCL 1.2 beignet 1.3 (git-5aba95a)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing
Platform Extensions function suffix Intel
Platform Name Clover
Platform Vendor Mesa
Platform Version OpenCL 1.1 Mesa 18.0.5
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA
Platform Name Intel(R) OpenCL HD Graphics
Number of devices 1
Device Name Intel(R) Gen9 HD Graphics NEO
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 1.2 NEO
Driver Version 19.14.12751
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 18
Max clock frequency 750MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 32, Little-Endian
Global memory size 3435970560 (3.2GiB)
Error Correction support No
Max memory allocation 1717985280 (1.6GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 131072
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 107374080 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 4 bytes
Pitch alignment for 2D image buffers 4 bytes
Max 2D image size 16384x16384 pixels
Max 3D image size 16384x16384x2048 pixels
Max number of read image args 128
Max number of write image args 128
Local memory type Local
Local memory size 65536 (64KiB)
Max constant buffer size 1717985280 (1.6GiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 52ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
Motion Estimation accelerator version (Intel) 2
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing
Platform Name Intel Gen OCL Driver
Number of devices 1
Device Name Intel(R) HD Graphics Broxton 0
Device Vendor Intel
Device Vendor ID 0x8086
Device Version OpenCL 1.2 beignet 1.3 (git-5aba95a)
Driver Version 1.3
Device OpenCL C Version OpenCL C 1.2 beignet 1.3 (git-5aba95a)
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 18
Max clock frequency 1000MHz
Device Partition (core)
Max number of sub-devices 1
Supported partition types None, None, None
Max work item dimensions 3
Max work item sizes 512x512x512
Max work group size 512
Preferred work group size multiple 16
Preferred / native vector sizes
char 16 / 8
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 8 (cl_khr_fp16)
float 4 / 4
double 0 / 2 (n/a)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 4102029312 (3.82GiB)
Error Correction support No
Max memory allocation 3076521984 (2.865GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 8192
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 65536 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 4096 bytes
Pitch alignment for 2D image buffers 1 bytes
Max 2D image size 8192x8192 pixels
Max 3D image size 8192x8192x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 65536 (64KiB)
Max constant buffer size 134217728 (128MiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 80ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing cl_khr_fp16
Platform Name Clover
Number of devices 0
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel(R) OpenCL HD Graphics
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [INTEL]
clCreateContext(NULL, ...) [default] Success [INTEL]
clCreateContext(NULL, ...) [other] Success [Intel]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Gen9 HD Graphics NEO
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Gen9 HD Graphics NEO
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.8
ICD loader Profile OpenCL 1.2
NOTE: your OpenCL library declares to support OpenCL 1.2,
but it seems to support up to OpenCL 2.1 too.
In contrast, very funny.
The device Intel(R) Gen9 HD Graphics NEO has all video-related extensions but no OpenGL sharing, another device Intel(R) HD Graphics Broxton 0 has all graphics extensions but no VAAPI sharing. It means either 1 or 2, but no possible to use OpenGL to visualize the thing from 1, really sad.
Id really like to see the CL GL context sharing aswell :+1: .
same here, I'm really disappointed that I can't port my application to linux with intel cpus because of this. cl_khr_gl_sharing seems like such a key extension to have I'm surprised you're not supporting it.
I also tried using the beignet implementation and while it does support the cl_khr_gl_sharing extension, one of the functions I required, cl_mem_new_gl_buffer was empty but for a line of code:
FATAL ("Not implemented")
tears all round.
Your arguments are convincing. We will fit this effort into our development schedule for the remainder of this year.
wow that's great to hear! your hard work is very much appreciated. best of luck to your team!
Hi. Yesterday my app seized working as it used cl_khr_gl_sharing with the beignet driver on linux and that seems to be disfunctional with my most recent update now. For my compositions it is absolutely crucial to work, so I'm quite devastated, as it is more or less impossible to implement my system on other Platforms/OSes (see: https://www.youtube.com/watch?v=2rWha1HTfFE&t=360s). I'd be extremely grateful if you'd implemented it and am absolutely willing to support funding if you let me know how!
@PiotrRozenfeld
We will fit this effort into our development schedule for the remainder of this year.
Are there any news about whether work has been done on this and whether/when it will be implemented?
This is being actively worked on. We don't have a clear trend date, but Q1 is very likely.
This is being actively worked on. We don't have a clear trend date, but Q1 is very likely.
@AdamCetnerowski: Thanks, very much appreciated!
Has there been any progress on this issue?
I would also very much like to see this feature implemented
Some foundational work has been done to enable the extension. At the this time the Q1 trend is no longer valid. We intend to provide a partial implementation early Q2 for evaluation. We’re looking forward to your feedback when it’s available.
We’re looking forward to your feedback when it’s available.
Most definitely! Right now I'm holding back any upgrades on my system as I have to keep llvm 8.0.1 for the beignet drivers.
We have made progress towards this feature, but it was slower than initially expected. We’ll update this issue when we have something that is user-testable.”
Hi. Any news on when this will be released? Could use it for my thesis. Otherwise I need to find a workaround ^^
While we did lay some groundwork for cl_khr_gl_sharing with MESA on Linux, we also had to increase our initial effort estimates, as additional work to be done in MESA and Compute before we can proceed with implementation. This does not mean we abandon the effort, but we can no longer accommodate that in the near future.
We'll post updates when the work will resume.
@PiotrRozenfeld thanks a lot for the info! Although this is very bad news for me, I now know that I'll either have to get beignet compiled with llvm 10, or resort to installing a dual boot until this is resolved. I sincerely hope this is not the final blow to this in compute-runtime.
Probably not most useful comment (I'm not a developer, just user whole likes to test something new) but I was looking at AMD's ROCM sources and found there
https://github.com/ROCm-Developer-Tools/ROCclr/blob/master/device/rocm/mesa_glinterop.h
/* Mesa OpenGL inter-driver interoperability interface designed for but not
* limited to OpenCL.
Of course this is only smallest part of whole work .
@PiotrRozenfeld any ETA of having a cl_khr_gl_sharing solution going? This extension missing causes a major problem since most of our linux machines are using Intel integrated GPUs.
While we did lay some groundwork for cl_khr_gl_sharing with MESA on Linux, we also had to increase our initial effort estimates, as additional work to be done in MESA and Compute before we can proceed with implementation. This does not mean we abandon the effort, but we can no longer accommodate that in the near future.
We'll post updates when the work will resume.
Any news on that ? This extension is really a must have for many applications.
Also curious how this is going? We are building and deploying Intel NUC based video acquisition/processing solutions and the lack of this extension is the only reason we're still on Beignet drivers
The work on this extension has stalled due to other work that our team is facing, unfortunately.
@smlehbleh - could you share a sample workload that you are looking to enable? What is your platform of interest? You are indicating this already works on Beignet - implying no additional work in MESA is needed.
Hi Piotr, Our use case is an Intel NUC powered cinema camera/recording device. The RAW image processing (debayering and colour transformation etc...) is performed in OpenCL kernels and the user interface is rendered with OpenGL ES. We use Beignet OpenCL drivers which support the cl_khr_gl_sharing extension to share the processed RAW image from an OpenCL buffer with OpenGL ES to draw it to the monitor screen. We are using Ubuntu 18.04 with an Intel NUC8v7PNB and (soon) an NUC11TNBv7. The cl_khr_gl_sharing extension implementation in Beignet has been present for quite a while (Maybe a couple of years...). For an experiment we tried the NEO CL driver and replaced our 'cl_khr_gl_sharing' GL/CL interop code with a manual additional copy from a CL buffer into a GL texture - the total CPU usage appeared to go from about 10% to 15-20% and a noticeable frame delay was introduced. The buffer was a UHD RGBA32 image (~33mb).
Hi @PiotrRozenfeld , I was just wondering if Beignet's existing cl_khr_gl_sharing support/implementation could be ported or copied over into this driver - or if it could generally reduce the amount of work as you mentioned?
As I can see in Beignet documentation cl_khr_gl_sharing is partially supported, and allow to create memory objects from OpenGL buffers or 2D textures. If I understand your use case correctly you're using cl/gl sharing differently, and you share OpenCL buffers with OpenGL, so memory for buffers is allocated by OpenCL not by OpenGL. Could you clarify?
Hi Jacek, We are using the Beignet implementation of cl_khr_gl_sharing in the standard intended use case - we create an OpenGL texture and then create an OpenCL buffer from it using 'clCreateFromGLTexture'. I realise in the previous comment it seems like we're directly sharing an OpenCL buffer with OpenGL, but our currently active implementation is an OpenCL buffer created from an OpenGL texture (clCreateFromGLTexture). This allows us to draw the GL texture to the screen with OpenGL after an OpenCL kernel writes to the underlying buffer without any additional 'copy'.
Hi Jacek,
Am Mittwoch, den 01. September 2021 um 05:26:00 Uhr (-0700) schrieb Jacek Danecki:
As I can see in Beignet documentation cl_khr_gl_sharing is partially supported, and allow to create memory objects from OpenGL buffers or 2D textures.
If I understand your use case correctly you're using cl/gl sharing differently, and you share OpenCL buffers with OpenGL, so memory for buffers is allocated by OpenCL not by OpenGL. Could you clarify?
it's unclear whether you address me in your mail.
The way I used it with the Beignet driver was to first allocate the buffers in OpenGL (using "gen-buffer") and then "recreating" them in OpenCL (within the opengl context with the allocated OpenGL buffer mapped) by calling the OpenCL routine "create-buffer" with the USE-HOST-PTR flag.
Is that what you were asking?
Best, Orm