OpenCL-Docs icon indicating copy to clipboard operation
OpenCL-Docs copied to clipboard

WIP cl_ext_yuv_images

Open kpet opened this issue 3 years ago • 7 comments

Change-Id: I079f7610926b06a7516652c02b264ecb0aa090b3

Oustanding issues:

  • [ ] Order of components returned by read_image
  • [ ] Border color
  • [ ] Support for creating images for formats where device access is not allowed
  • [x] Add per-format queries for max supported image dimensions (https://github.com/KhronosGroup/OpenCL-Docs/pull/750)
  • [ ] Add per-format queries for image size multiple constraints (e.g. height must be a multiple of X)
  • [ ] Capability for host access? Clarify that CL_MEM_HOST_NO_ACCESS is meaningful to pass to clGetSupportedImageFormats?
  • [ ] For planar formats, specify that image sizes relate to Y plane and make it possible to express size constraints
  • [ ] Constraints on origin/region for host access
  • [ ] Do we want to support CL_MEM_COPY_HOST_PTR and/or CL_MEM_USE_HOST_PTR?
  • [ ] Ability to report writes as supported only for separate planes (Y+UV instead of YUV)

kpet avatar Nov 17 '21 10:11 kpet

I did an initial review of this extension, mostly comparing against cl_intel_packed_yuv and cl_intel_planar_yuv. In short, I don't think we'd be able to support this extension as-written. Here are a few items I found:

  1. Our sampling hardware returns data from YUV images in the order (V, Y, U, 1.0), whereas this extension specifies the channel mapping (y, u, v, 1.0).
  2. We do not support a border color for YUV images (border color is undefined), whereas this extension requires the border color (0.0f, 0.0f, 0.0f, 1.0f) for YUV images without alpha, or (0.0f, 0.0f, 0.0f, 0.0f) for YUV images with alpha.
  3. Some of our devices can create or import planar YUV images but cannot directly access them in a kernel. On these devices, an application must create create sub-images representing individual planes of these YUV images to operate on them. See CL_MEM_NO_ACCESS_INTEL.
  4. Some of our devices have different (smaller) maximum image dimensions for planar YUV images. See CL_DEVICE_PLANAR_YUV_MAX_WIDTH_INTEL and CL_DEVICE_PLANAR_YUV_MAX_HEIGHT_INTEL.
  5. Some of our devices have a requirement that the image width must be a multiple of four for a planar YUV image.
  6. We do not currently support host access to planar YUV images, although we do support host access to sub-images representing individual planes of the planar YUV image.

A few other questions and comments:

  • I'm unfamiliar with the CL_YUV_420_EXT packed image format. How exactly is this format arranged in memory? Are there some common FOURCCs that use this format?
  • How does write_imagef work on the YUV image formats when there is sub-sampled data?
  • Is there some reason why the sampler passed to read_imagef must have CL_ADDRESS_CLAMP? It should be trivial to support at least CL_ADDRESS_NONE as well.
  • It's not clear to me what CL_IMAGE_ELEMENT_SIZE_IN_BITS_EXT returns and how this is different then the usual CL_IMAGE_ELEMENT_SIZE.
  • We found that some parts of the core spec compute values from the number of channels in an image format ("num_channels"), and that it was easiest to describe the number of channels in a packed YUV 420 image as 2 vs. 4 or some other value. Regardless what gets chosen for this extension, it's worth doing a careful check to make sure all of the right parts of the spec are updated.
  • It would be good to explicitly state whether the image width and height is in units of the Y data or in the possibly sub-sampled UV data. If it's the Y data, the width or height may need to be a multiple of the sub-sampling factor.
  • Likewise, for host access, the origin or region width may need to be a multiple of the sub-sampling factor.

bashbaug avatar Nov 30 '21 06:11 bashbaug

Thanks for the initial review, that's very useful feedback! I don't think there are any irreconcilable differences so far.

  1. I did notice this in the Intel extensions. I'm not sure how big an issue that is in practice though. Presumably, it'd be easy to insert swizzles in the compiler and I would expect them to turn into no-ops most of the time. Do you think it'd be a practical/performance problem for you?
  2. I think we'd be fine with undefined or implementation-defined. We could also make it configurable if there's appetite for this.
  3. We could standardise CL_MEM_NO_ACCESS maybe. You'd only report those formats when CL_MEM_NO_ACCESS is passed to clGetSupportedImageFormats.
  4. I did see this. I was thinking of handling this with the upcoming KHR extension that enables queries on image formats. This could enable implementations to return per-format limits.
  5. Same as 4. I think this could be reported using that mechanism and we could specify constraints here. Implementations that don't have that constraint could just report 1.
  6. I'm fine to make this a capability if that's what people want. Arguably implementations can already do this with the extension as is by only reporting YUV formats when CL_MEM_HOST_NO_ACCESS is passed to clGetSupportedImageFormats. Maybe this could be a separate core spec clarification.

Regarding your other comments/questions:

I'm unfamiliar with the CL_YUV_420_EXT packed image format. How exactly is this format arranged in memory? Are there some common FOURCCs that use this format?

That one is rather obscure. I believe it's YYYYUVYYYYUV. I couldn't find a FOURCC for it. I'm not sure we'd support it in the end. I've tried to describe most formats we could potentially support to test the expressiveness of the approach I've taken but there are formats we're not particularly interested in at this point. This is one of them.

How does write_imagef work on the YUV image formats when there is sub-sampled data?

Depending on HW support and formats, implementations may have to handle separate scatter stores or RMW to write individual pixels.

Is there some reason why the sampler passed to read_imagef must have CL_ADDRESS_CLAMP? It should be trivial to support at least CL_ADDRESS_NONE as well.

Because that's what we support :). Agree that CL_ADDRESS_NONE is a no-brainer, I'll add that. If others want more options, I'm fine with making this configurable.

It's not clear to me what CL_IMAGE_ELEMENT_SIZE_IN_BITS_EXT returns and how this is different then the usual CL_IMAGE_ELEMENT_SIZE.

Some formats have an element size that is not a round number of bytes. That's why I introduced this new query. I did this fairly mechanically so it may be that there is something more useful to expose instead.

We found that some parts of the core spec compute values from the number of channels in an image format ("num_channels"), and that it was easiest to describe the number of channels in a packed YUV 420 image as 2 vs. 4 or some other value. Regardless what gets chosen for this extension, it's worth doing a careful check to make sure all of the right parts of the spec are updated.

That's useful input, thanks! Do you have an example?

It would be good to explicitly state whether the image width and height is in units of the Y data or in the possibly sub-sampled UV data. If it's the Y data, the width or height may need to be a multiple of the sub-sampling factor.

Yes, this should be specified, probably in terms of Y data with constraints.

Likewise, for host access, the origin or region width may need to be a multiple of the sub-sampling factor.

Yes, we'll need to look into this. We could add format queries for this if we can't agree on a single value for everyone.

kpet avatar Nov 30 '21 15:11 kpet

We can create separate issues for parts of this discussion if it would be easier to track them that way.

Regarding swizzles and mapping to vector components:

  1. Presumably, it'd be easy to insert swizzles in the compiler and I would expect them to turn into no-ops most of the time. Do you think it'd be a practical/performance problem for you?

If we knew when to insert the swizzles then it wouldn't be a problem, but I think we'd need to insert conditional code to swizzle or not depending on the image format, and this would cause overhead even for kernels that do not use YUV images.

There additions we could make to OpenCL C and SPIR-V that wouldn't require this overhead, like new built-in functions to read YUV images or a new YUV image type, but those are bigger changes.

Regarding host access to planar YUV images:

6. I'm fine to make this a capability if that's what people want. Arguably implementations can already do this with the extension as is by only reporting YUV formats when CL_MEM_HOST_NO_ACCESS is passed to clGetSupportedImageFormats. Maybe this could be a separate core spec clarification.

It would be interesting to see some examples how host access to planar YUV images could work.

IIRC we dis-allowed host access in our extension partially to reduce scope, but also because it wasn't immediately obvious how the host access would work. As a thought experiment, how would clEnqueueReadImage read from the different planes? Could clEnqueueReadImage read multiple planes in one call? This extension restricts clEnqueueMapImage to only work on a single plane where the plane is specified via cl_map_flags, but we can't use a similar mechanism to restrict clEnqueueReadImage or clEnqueueWriteImage.

We found that some parts of the core spec compute values from the number of channels in an image format ("num_channels"), and that it was easiest to describe the number of channels in a packed YUV 420 image as 2 vs. 4 or some other value. Regardless what gets chosen for this extension, it's worth doing a careful check to make sure all of the right parts of the spec are updated.

That's useful input, thanks! Do you have an example?

It's been a while and I think we initially wrote the packed YUV extension against the OpenCL 1.2 spec, but I definitely remember debating whether a packed YUV 420 image was a 2-channel image or a 4-channel image and what we should return for CL_IMAGE_ELEMENT_SIZE and we ultimately settled on "2 channels" and "2 bytes per element". Neither option seemed like a perfect fit, but I do recall this being the better of the two.

bashbaug avatar Nov 30 '21 21:11 bashbaug

I've added a list of outstanding issues to the PR description. We can link separate issues if some points require more discussion than reasonable to have here.

I've also allowed CLK_ADDRESS_NONE.

If we knew when to insert the swizzles then it wouldn't be a problem, but I think we'd need to insert conditional code to swizzle or not depending on the image format, and this would cause overhead even for kernels that do not use YUV images.

Right, of course. Maybe you could always swizzle to match the YUV order if you have flexibility on the RGBA side of things. If you can't then we need to look into changing the order (we have some flexibility here on our side). Hopefully there's a single order that could work for all interested implementers. If not we might have to make bigger changes as you suggested (as unattractive as available options seem to be).

It would be interesting to see some examples how host access to planar YUV images could work.

The spec currently states (straw-man) that planes are packed one after the other, aligned to the slice pitch. This definitely needs more thinking as I'm not sure that's the most useful behaviour. To access individual planes, an application would have to create an image for the specific plane from the planar image. There's always that option but it might be cumbersome in practice. Agree that it seems difficult to cleanly integrate this into clEnqueueReadImage/clEnqueueWriteImage

kpet avatar Dec 03 '21 13:12 kpet

Thanks for making the task list. I think that will be good enough for now.

I think I have a solution to the YUV order from the read_image built-ins. I need to test it to confirm but it should work in theory.

Assuming it works it does mean that we cannot reuse or alias any of the enums from the Intel extensions, since images with the Intel formats will return one order and images with these formats will return a different order, but I don't think we were planning to do that anyways.

bashbaug avatar Dec 03 '21 23:12 bashbaug

I've just rebased the specification, added a dependency on https://github.com/KhronosGroup/OpenCL-Docs/pull/750 for per-format queries for max supported image dimensions and ticked the item in the list of outstanding issues.

I think I have a solution to the YUV order from the read_image built-ins. I need to test it to confirm but it should work in theory.

@bashbaug Have you been able to confirm this?

kpet avatar Jan 25 '22 15:01 kpet

We have concerns about the channel mappings for image reads. This does not align with what we have currently. We also have concerns about writing to YUV images. We only support writes to single planes. Also, is the intersection with GetSupportedImageFormats, creating an image from buffer, creating an image with host pointer etc. defined.

bcalidas avatar Dec 11 '23 23:12 bcalidas