OpenCL-Docs
OpenCL-Docs copied to clipboard
WIP cl_ext_yuv_images
Change-Id: I079f7610926b06a7516652c02b264ecb0aa090b3
Oustanding issues:
- [ ] Order of components returned by
read_image
- [ ] Border color
- [ ] Support for creating images for formats where device access is not allowed
- [x] Add per-format queries for max supported image dimensions (https://github.com/KhronosGroup/OpenCL-Docs/pull/750)
- [ ] Add per-format queries for image size multiple constraints (e.g. height must be a multiple of X)
- [ ] Capability for host access? Clarify that
CL_MEM_HOST_NO_ACCESS
is meaningful to pass toclGetSupportedImageFormats
? - [ ] For planar formats, specify that image sizes relate to Y plane and make it possible to express size constraints
- [ ] Constraints on origin/region for host access
- [ ] Do we want to support CL_MEM_COPY_HOST_PTR and/or CL_MEM_USE_HOST_PTR?
- [ ] Ability to report writes as supported only for separate planes (Y+UV instead of YUV)
I did an initial review of this extension, mostly comparing against cl_intel_packed_yuv and cl_intel_planar_yuv. In short, I don't think we'd be able to support this extension as-written. Here are a few items I found:
- Our sampling hardware returns data from YUV images in the order (V, Y, U, 1.0), whereas this extension specifies the channel mapping (y, u, v, 1.0).
- We do not support a border color for YUV images (border color is undefined), whereas this extension requires the border color (0.0f, 0.0f, 0.0f, 1.0f) for YUV images without alpha, or (0.0f, 0.0f, 0.0f, 0.0f) for YUV images with alpha.
- Some of our devices can create or import planar YUV images but cannot directly access them in a kernel. On these devices, an application must create create sub-images representing individual planes of these YUV images to operate on them. See
CL_MEM_NO_ACCESS_INTEL
. - Some of our devices have different (smaller) maximum image dimensions for planar YUV images. See
CL_DEVICE_PLANAR_YUV_MAX_WIDTH_INTEL
andCL_DEVICE_PLANAR_YUV_MAX_HEIGHT_INTEL
. - Some of our devices have a requirement that the image width must be a multiple of four for a planar YUV image.
- We do not currently support host access to planar YUV images, although we do support host access to sub-images representing individual planes of the planar YUV image.
A few other questions and comments:
- I'm unfamiliar with the
CL_YUV_420_EXT
packed image format. How exactly is this format arranged in memory? Are there some common FOURCCs that use this format? - How does
write_imagef
work on the YUV image formats when there is sub-sampled data? - Is there some reason why the sampler passed to
read_imagef
must haveCL_ADDRESS_CLAMP
? It should be trivial to support at leastCL_ADDRESS_NONE
as well. - It's not clear to me what
CL_IMAGE_ELEMENT_SIZE_IN_BITS_EXT
returns and how this is different then the usualCL_IMAGE_ELEMENT_SIZE
. - We found that some parts of the core spec compute values from the number of channels in an image format ("num_channels"), and that it was easiest to describe the number of channels in a packed YUV 420 image as 2 vs. 4 or some other value. Regardless what gets chosen for this extension, it's worth doing a careful check to make sure all of the right parts of the spec are updated.
- It would be good to explicitly state whether the image width and height is in units of the Y data or in the possibly sub-sampled UV data. If it's the Y data, the width or height may need to be a multiple of the sub-sampling factor.
- Likewise, for host access, the origin or region width may need to be a multiple of the sub-sampling factor.
Thanks for the initial review, that's very useful feedback! I don't think there are any irreconcilable differences so far.
- I did notice this in the Intel extensions. I'm not sure how big an issue that is in practice though. Presumably, it'd be easy to insert swizzles in the compiler and I would expect them to turn into no-ops most of the time. Do you think it'd be a practical/performance problem for you?
- I think we'd be fine with undefined or implementation-defined. We could also make it configurable if there's appetite for this.
- We could standardise
CL_MEM_NO_ACCESS
maybe. You'd only report those formats whenCL_MEM_NO_ACCESS
is passed toclGetSupportedImageFormats
. - I did see this. I was thinking of handling this with the upcoming KHR extension that enables queries on image formats. This could enable implementations to return per-format limits.
- Same as 4. I think this could be reported using that mechanism and we could specify constraints here. Implementations that don't have that constraint could just report 1.
- I'm fine to make this a capability if that's what people want. Arguably implementations can already do this with the extension as is by only reporting YUV formats when
CL_MEM_HOST_NO_ACCESS
is passed toclGetSupportedImageFormats
. Maybe this could be a separate core spec clarification.
Regarding your other comments/questions:
I'm unfamiliar with the CL_YUV_420_EXT packed image format. How exactly is this format arranged in memory? Are there some common FOURCCs that use this format?
That one is rather obscure. I believe it's YYYYUVYYYYUV. I couldn't find a FOURCC for it. I'm not sure we'd support it in the end. I've tried to describe most formats we could potentially support to test the expressiveness of the approach I've taken but there are formats we're not particularly interested in at this point. This is one of them.
How does write_imagef work on the YUV image formats when there is sub-sampled data?
Depending on HW support and formats, implementations may have to handle separate scatter stores or RMW to write individual pixels.
Is there some reason why the sampler passed to read_imagef must have CL_ADDRESS_CLAMP? It should be trivial to support at least CL_ADDRESS_NONE as well.
Because that's what we support :). Agree that CL_ADDRESS_NONE
is a no-brainer, I'll add that. If others want more options, I'm fine with making this configurable.
It's not clear to me what CL_IMAGE_ELEMENT_SIZE_IN_BITS_EXT returns and how this is different then the usual CL_IMAGE_ELEMENT_SIZE.
Some formats have an element size that is not a round number of bytes. That's why I introduced this new query. I did this fairly mechanically so it may be that there is something more useful to expose instead.
We found that some parts of the core spec compute values from the number of channels in an image format ("num_channels"), and that it was easiest to describe the number of channels in a packed YUV 420 image as 2 vs. 4 or some other value. Regardless what gets chosen for this extension, it's worth doing a careful check to make sure all of the right parts of the spec are updated.
That's useful input, thanks! Do you have an example?
It would be good to explicitly state whether the image width and height is in units of the Y data or in the possibly sub-sampled UV data. If it's the Y data, the width or height may need to be a multiple of the sub-sampling factor.
Yes, this should be specified, probably in terms of Y data with constraints.
Likewise, for host access, the origin or region width may need to be a multiple of the sub-sampling factor.
Yes, we'll need to look into this. We could add format queries for this if we can't agree on a single value for everyone.
We can create separate issues for parts of this discussion if it would be easier to track them that way.
Regarding swizzles and mapping to vector components:
- Presumably, it'd be easy to insert swizzles in the compiler and I would expect them to turn into no-ops most of the time. Do you think it'd be a practical/performance problem for you?
If we knew when to insert the swizzles then it wouldn't be a problem, but I think we'd need to insert conditional code to swizzle or not depending on the image format, and this would cause overhead even for kernels that do not use YUV images.
There additions we could make to OpenCL C and SPIR-V that wouldn't require this overhead, like new built-in functions to read YUV images or a new YUV image type, but those are bigger changes.
Regarding host access to planar YUV images:
6. I'm fine to make this a capability if that's what people want. Arguably implementations can already do this with the extension as is by only reporting YUV formats when
CL_MEM_HOST_NO_ACCESS
is passed toclGetSupportedImageFormats
. Maybe this could be a separate core spec clarification.
It would be interesting to see some examples how host access to planar YUV images could work.
IIRC we dis-allowed host access in our extension partially to reduce scope, but also because it wasn't immediately obvious how the host access would work. As a thought experiment, how would clEnqueueReadImage
read from the different planes? Could clEnqueueReadImage
read multiple planes in one call? This extension restricts clEnqueueMapImage
to only work on a single plane where the plane is specified via cl_map_flags
, but we can't use a similar mechanism to restrict clEnqueueReadImage
or clEnqueueWriteImage
.
We found that some parts of the core spec compute values from the number of channels in an image format ("num_channels"), and that it was easiest to describe the number of channels in a packed YUV 420 image as 2 vs. 4 or some other value. Regardless what gets chosen for this extension, it's worth doing a careful check to make sure all of the right parts of the spec are updated.
That's useful input, thanks! Do you have an example?
It's been a while and I think we initially wrote the packed YUV extension against the OpenCL 1.2 spec, but I definitely remember debating whether a packed YUV 420 image was a 2-channel image or a 4-channel image and what we should return for CL_IMAGE_ELEMENT_SIZE
and we ultimately settled on "2 channels" and "2 bytes per element". Neither option seemed like a perfect fit, but I do recall this being the better of the two.
I've added a list of outstanding issues to the PR description. We can link separate issues if some points require more discussion than reasonable to have here.
I've also allowed CLK_ADDRESS_NONE
.
If we knew when to insert the swizzles then it wouldn't be a problem, but I think we'd need to insert conditional code to swizzle or not depending on the image format, and this would cause overhead even for kernels that do not use YUV images.
Right, of course. Maybe you could always swizzle to match the YUV order if you have flexibility on the RGBA side of things. If you can't then we need to look into changing the order (we have some flexibility here on our side). Hopefully there's a single order that could work for all interested implementers. If not we might have to make bigger changes as you suggested (as unattractive as available options seem to be).
It would be interesting to see some examples how host access to planar YUV images could work.
The spec currently states (straw-man) that planes are packed one after the other, aligned to the slice pitch. This definitely needs more thinking as I'm not sure that's the most useful behaviour. To access individual planes, an application would have to create an image for the specific plane from the planar image. There's always that option but it might be cumbersome in practice. Agree that it seems difficult to cleanly integrate this into clEnqueueReadImage
/clEnqueueWriteImage
Thanks for making the task list. I think that will be good enough for now.
I think I have a solution to the YUV order from the read_image built-ins. I need to test it to confirm but it should work in theory.
Assuming it works it does mean that we cannot reuse or alias any of the enums from the Intel extensions, since images with the Intel formats will return one order and images with these formats will return a different order, but I don't think we were planning to do that anyways.
I've just rebased the specification, added a dependency on https://github.com/KhronosGroup/OpenCL-Docs/pull/750 for per-format queries for max supported image dimensions and ticked the item in the list of outstanding issues.
I think I have a solution to the YUV order from the read_image built-ins. I need to test it to confirm but it should work in theory.
@bashbaug Have you been able to confirm this?
We have concerns about the channel mappings for image reads. This does not align with what we have currently. We also have concerns about writing to YUV images. We only support writes to single planes. Also, is the intersection with GetSupportedImageFormats, creating an image from buffer, creating an image with host pointer etc. defined.