GPU Texture caching and duplication detection
In this issue, I want to discuss about OCIO 2 GPU path and potential for further optimisations. There are two topics that were discussed during the last TSC meeting:
- Implementing texture caching across multiple Processors in client app,
- Handling duplication of texture (LUTs) within a single Processor.
Implementing texture caching across multiple Processors in client app
In the context of playback software development, for example using the LegacyViewingPipeline from apphelpers, you can bundle the whole display rendering pipeline into a single GPUProcessor and extract a ShaderDesc from there. Assuming you are loading a playlist with a number of shots where the only difference in processing will be a grading operation (like a CDL), you will potentially end up with duplicated textures among the different ShaderDesc (if your display pipeline use LUTs).
A reasonable approach would then be to implement caching on top of OCIO, which requires a way to generate unique IDs describing the textures used in the shader. This will at least require hashing the texture format description as well as the content, the latter being expensive to compute. OCIO already generate such hash based on LUT entries and the idea would be to somehow be able to query that information and avoid unnecessary cost of computing that in the client app.
Handling duplication of texture (LUTs) within a single Processor
Still in the same context, a simple example of workflow would be:
Scene-Linear -> Lin to Log -> CDL -> Log to Lin -> LinearCC -> Lin to Log -> 3DLUT
Assuming you are using LUTs to implement the Log space, there is currently duplication of texture happening within a single ShaderDesc instance (here for the Lin to Log direction) - even after the Processor has been optimised because these cannot be concatenated or discarded due to the ordering. This is only an example and one could argue using OCIO 2 new transforms could reduce the needs for LUTs in the first place but this might still be something we can improve.
Here is some quote from @hodoulp extracted from a recent Slack thread:
There is no way from the current public API to catch the LUT unique identifier i.e. only used internally for building the processor unique identifier. To share the same GPU texture, there are two areas to think about:
- The data itself i.e. GpuShaderDesCreator::addTexture() & GpuShaderDesc::getTextureValues()
- The handler name i.e. GpuShaderDesCreator::addTexture() & GpuShaderDesc::getTexture()
The public API will have to be improved to provide accesses and/or it could be a mechanism automatically enabled. The latter is much more promising as it will not need any API change.
In the second case (i.e. handling duplicated LUTs in a single processor) before looking for a valid texture name, the first step would be to have a global ‘state’ of the current processor conversion (i.e. conversion memory instance to GPU code). That’s missing so the code cannot detect duplicated LUTs. That state instance will be in charge to detect duplicated LUTs, keep only one LUT data and finally return the same handler name so that the ops are not aware of all this work. As soon as there is a kind of global picture of the color transformation, it will also enable (with much more work) concatenations of LUTs. For example, having two ‘small’ LUTs combined in a single GPU texture.
It seems that a solution in part based on handler name could be a promising way to solve these two issues at once.