OpenColorIO
OpenColorIO copied to clipboard
[Draft] ACES2 Output Transforms
Opening this PR early even though it's very much a work in progress / prototype so far, the code is more or less imported directly from the CTL.
Next steps includes:
- Adding all the official output transforms from the ampas repository
- Adding support for creative white point and surround gamma factor
- Review lookup table constexpr initialization (assuming OCIO 2.4 moves to C++14, C++14 has limitations in constexpr support making it less clean, increase library size (probably), custom compiler flags needed to lift constexpr instruction count limit), implement caching for custom output transform's tables
- Review current CPU code duplication (reimplement NPM derivation, matrix inversion, etc) that are available in other place in OCIO, though typically using the Ops abstraction, not native code / types
- Optimize CPU path (currently 8x slower than ACES 1)
- Implement GPU path
- Cleanup the FixedFunctions used for testing
- Add tonescale (Daniele's curve) only FixedFunction
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: remia / name: Rémi Achard (bcd738233ee5223e1a1e67fd223d01e787627326, 7d32d888f9a10d3706e56982bf8bf89409940447, 131cebd1ca3f26bd7b71208a8d10b4df55536641, 31b64119431fc97b0a224ff894fcc379e7c1c997, 28ada044a36e7c2fb3273bafea84b7506f94c171, f673a3fba128f9baed1683e6bb086fe2e6298cfe)
Thank you so much for doing this Remi! Everything aside from ACES2CPUHelpers.h looks perfect. And please assume that OCIO will move to C++14 minimum if that allows a cleaner implementation.
I understand, as you wrote above, that the ACES2CPUHelpers.h module is a direct port of the CTL. But wow, this still really feels like research code. I'm surprised it's only 8x slower than ACES 1, there are so many unnecessary conversions being done. I know you wrote above that optimization is still a TODO, but it really seems like that is going to be a huge task in order to get something reasonably performant. I don't think it's simply a matter of pre-multiplying a few matrices and swapping in the fast power function. For example, I see things like:
const float linear = Hellwig_J_to_Y(inputJMh[0], inputJMhParams) / reference_luminance;
const float luminanceTS = tonescale_fwd(linear, params);
const float tonemappedJ = Y_to_Hellwig_J(luminanceTS, inputJMhParams);
And that seems like it should be replaced by a spline or something. But building these approximations would take time and may not be exact. I'm very worried about the amount of work here, given that OCIO 2.4 is supposed to release in only a few months. It seems like we need more people working on this.
I feel like we should ask the ACES WG to provide CTL that is closer to being ready to implement?
Pushed an update with my current work in progress, this mainly split the new FixedFunctionOps into 3 for JMh conversion, Tonescale / ChromaCompress and GamutCompress. The code still follows the CTL from aces-dev and has not been optimized further yet (as discussed).
One point that will probably be interesting to agree on is the FixedFunctionOps split strategy, as this will be hard to change later and might impact the optimization. There is still some duplication of work here and there (like Matrix that could be merged) and the GPU path also has texture duplication where ChromaCompression and GamutCompression uses the same tables.
Awesome @remia! I built some test wheels on test.pypi.org: https://test.pypi.org/project/opencolorio/
The Windows tests were breaking so I bypassed them: https://github.com/AcademySoftwareFoundation/OpenColorIO/commit/ea1229e9142d926a62b7b4f08d5a215738f789a5
Looking at the builtins, we have those currently:
ACES-2-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-VIDEO-REC709lim
ACES-2-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-VIDEO-REC709-D60lim
ACES-2-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-VIDEO-1000nit-P3D65lim
It would be useful for me to have all of them, even if they are null ops so that we can move on the config side. I'm becoming mildly concerned about the 15 September deadline considering the quantity of work left.
@KelSolaar I added more (if not all) transforms in my latest commit and started a spreadsheet to list everything I have so far. I'm not opposed to changing the list or names if any feedback comes. Note that I dropped the distinction between VIDEO and CINEMA as there is not really any difference I think so far (eg. no surround compensation in the CTL).
Thanks for the updates Remi! This looks really good. I have a few comments:
- It might be useful to reorder the D60 tab in the spreadsheet to match the D65 one so that it's easier to compare one to the other, but they do seem to match up (each D65 transform has a matching D60 version).
- The Built-in transform names are all fine with me except for the use of "XYZ-E" in some of the D60 ones. I understand that on the aces-dev side this means that XYZ is normalized to illuminant E. However, we've been using a different convention in OCIO. In OCIO, XYZ is ALWAYS normalized to illuminant E, and so the dash is used to indicate the adaptive white point. So I would propose changing those to "XYZ-D60", or if you don't like that, then just "XYZ".
- You wrote that you added "more (if not all)" the transforms. Comparing to Scott's list, it seems like you have all of them (25 each for D65 and D60). Are there others you are thinking of adding? Thomas seems to have 50 output transforms in his spreadsheet too.
- @KelSolaar do you have what you need to proceed with the config generator?
Once fixed, I'm also getting these additional warnings + error in HLSL:
Shader@0x000001AD29D520E0(266,23-61): warning X3571: pow(f, e) will not work for negative f, use abs(f) or conditionally handle negative values if you expect them
Shader@0x000001AD29D520E0(281,47-90): warning X3571: pow(f, e) will not work for negative f, use abs(f) or conditionally handle negative values if you expect them
Shader@0x000001AD29D520E0(282,28-75): warning X3571: pow(f, e) will not work for negative f, use abs(f) or conditionally handle negative values if you expect them
Shader@0x000001AD29D520E0(302,19-38): warning X3571: pow(f, e) will not work for negative f, use abs(f) or conditionally handle negative values if you expect them
Shader@0x000001AD29D520E0(303,22-47): warning X3571: pow(f, e) will not work for negative f, use abs(f) or conditionally handle negative values if you expect them
Shader@0x000001AD29D520E0(65,18-109): warning X3570: gradient instruction used in a loop with varying iteration, attempting to unroll the loop
Shader@0x000001AD29D520E0(63,3-25): error X3511: unable to unroll loop, loop does not appear to terminate in a timely manner (1024 iterations)
The error stems from while (i_lo + 1 < i_hi) in ocio_gamut_cusp_table1_sample. This could potentially be fixed by inserting a [loop] instruction, limiting support to SM4 and above I believe. Related note being that OCIO's HLSL enum shouldn't use _DX11, but rather a shader model version: separate CL though.
EDIT: Actually, [loop] would not work here since the loop contains texture fetches, see: warning X3570: gradient instruction used in a loop with varying iteration, attempting to unroll the loop. It seems like the function may need to be rewritten in a different way here?
Just built new wheels, will check the builtins soon: https://test.pypi.org/project/opencolorio/2.4.0.dev2/
Can see a bunch of stuff in the builtins :)
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-100nit-REC709_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-100nit-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-108nit-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-300nit-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-500nit-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-1000nit-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-2000nit-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-4000nit-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-500nit-REC2020_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-1000nit-REC2020_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-2000nit-REC2020_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-4000nit-REC2020_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-100nit-REC709-D60-in-REC709-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-100nit-REC709-D60-in-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-100nit-REC709-D60-in-REC2020-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-100nit-P3-D60-in-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-100nit-P3-D60-in-XYZ-E_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-108nit-P3-D60-in-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-300nit-P3-D60-in-XYZ-E_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-500nit-P3-D60-in-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-1000nit-P3-D60-in-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-2000nit-P3-D60-in-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-4000nit-P3-D60-in-P3-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-500nit-P3-D60-in-REC2020-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-1000nit-P3-D60-in-REC2020-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-2000nit-P3-D60-in-REC2020-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-4000nit-P3-D60-in-REC2020-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-500nit-REC2020-D60-in-REC2020-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-1000nit-REC2020-D60-in-REC2020-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-2000nit-REC2020-D60-in-REC2020-D65_2.0',
'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - HDR-4000nit-REC2020-D60-in-REC2020-D65_2.0',
Thanks for the feedback!
It might be useful to reorder the D60 tab in the spreadsheet to match the D65 one so that it's easier to compare one to the other, but they do seem to match up (each D65 transform has a matching D60 version).
I just did a re-order pass, now both tabs should be aligned and sorted by CTL names.
The Built-in transform names are all fine with me except for the use of "XYZ-E" in some of the D60 ones. I understand that on the aces-dev side this means that XYZ is normalized to illuminant E. However, we've been using a different convention in OCIO. In OCIO, XYZ is ALWAYS normalized to illuminant E, and so the dash is used to indicate the adaptive white point. So I would propose changing those to "XYZ-D60", or if you don't like that, then just "XYZ".
Based on the other names, especially the D60 ones, the part after the "in" is meant to be the encoding primaries (of the container space), which are needed for the white scaling step. For example 'ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-100nit-P3-D60-in-P3-D65_2.0', so I think it best we use just XYZ rather than XYZ-D60 in this case. What do you think?
You wrote that you added "more (if not all)" the transforms. Comparing to Scott's list, it seems like you have all of them (25 each for D65 and D60). Are there others you are thinking of adding? Thomas seems to have 50 output transforms in his spreadsheet too.
I think everything is here, sorry for the misleading wording!
The error stems from while (i_lo + 1 < i_hi) in ocio_gamut_cusp_table1_sample. This could potentially be fixed by inserting a [loop] instruction, limiting support to SM4 and above I believe. Related note being that OCIO's HLSL enum shouldn't use _DX11, but rather a shader model version: separate CL though.
EDIT: Actually, [loop] would not work here since the loop contains texture fetches, see: warning X3570: gradient instruction used in a loop with varying iteration, attempting to unroll the loop. It seems like the function may need to be rewritten in a different way here?
Thanks for sharing those warnings / errors, that's really helpful as I don't have an easy way to check HLSL.
I agree the texture sampling in the while loop is currently far from ideal (probably even from a performance only standpoint). In theory I think there would not be a need to compute derivative here as we are really doing NEAREST sampling on the texture, not needing to pick a minification or magnification filter (but there might very well be other reasons I'm unaware of), in my GLSL tests I used texelFetch() for example, but to maintain compatibility with older implementations we can't rely on that, I assume HLSL may have similar features.
I'm still working on more comprehensive comparison against CTLs on actual images, seeing some GPU texture sampling issues as well on some transforms that might be related.
I think it best we use just XYZ rather than XYZ-D60 in this case.
I also think it would be preferable not to include "D60" in the name as that could be confusing. In the ACES repo, the white point used in the name is the parameter value for encoding white point, so is the value that needs to be passed to the display_encoding() function. For DCDM, Illuminant E needs to be passed. If you use "D60" in the name, somebody could assume that is the value which should be passed to match your result.
Just saying "XYZ", without a specific white is, as you say, @doug-walker, implicitly normalised to Illuminant E.
I started to fill-up the spreadsheet: https://docs.google.com/spreadsheets/d/1z3xsy3sF0I-8AN_tkMOEjHlAs13ba7VAVhrE8v4WIyo/edit?gid=1761235937#gid=1761235937
This will require cross-checking but I think that we are missing quite a few builtins and I could not assign some of them:
ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-100nit-REC709-D60-in-P3-D65_2.0ACES-OUTPUT - ACES2065-1_to_CIE-XYZ-D65 - SDR-100nit-P3-D60-in-XYZ-E_2.0
I haven't had a chance to check the code yet to find out what they are. That being said, the assignment/filling is not exactly trivial as the name are not always close.
Hi @KelSolaar,
The assignment of ViewTransform to CTLs should be available in the spreadsheet shared earlier. I agree the names are not directly matching in some cases, for example all the 48 nits use the same parameters as the 100 nits, an alternative would be to duplicate the ViewTransform in OCIO to have them under different names but not sure I'd like that.
Hi @KelSolaar,
The assignment of ViewTransform to CTLs should be available in the spreadsheet shared earlier. I agree the names are not directly matching in some cases, for example all the 48 nits use the same parameters as the 100 nits, an alternative would be to duplicate the ViewTransform in OCIO to have them under different names but not sure I'd like that.
Ah thanks! I actually thought about the spreadsheet after posting, it was late and I was intending to check today.
Thanks for the 48 nits explanation, that makes sense to me, I will adjust our other spreadsheets accordingly and see how far I was :D
I pushed an update that should fix the HLSL shader compilation issue by extracting the hue component from the texture into a constant array, as suggested by @KevinJW. Also included some minor updates to keep up with the CTL code (though this will have to be more thoroughly reviewed).
Also sharing a script and OCIO config I have been using to compare CTL and OCIO results on images. This is a very simple script so far and is mostly valuable for the CTL to OCIO display / view mapping with the accompanying OCIO test config. I updated the reference CTL to use the parametric chroma norm.
From what I can tell, here are a list of remaining TODOs before merging in the hope to share the workload:
- Agree on the public APIs changes (set of new exposed Builtins and FixedFunction). As discussed yesterday, some choices made for the FixedFunction splitting might impact later optimisation work so it could be useful to have a way to hide the new FixedFunction for the time being
- Agree on the use of ongoing simplification work not yet present in the CTL reference (chromaCompress parametric norm is currently in OCIO but not in the official CTL, Kevin's implementation potential improvements)
- Review of the current CPU and GPU code and potential cleanup of remaining redundant computation
- Further validation against the CTL reference implementation on patterns and test images with an appropriate error metric, I got reasonable match using a simple image like Kodak LAD but expect edge cases to show up in an image like ACES syntheticChart
- Validation on HLSL
@remia: As of 6585f18c889d68f236548ae536811a0b27571458, I can only see the 31 builtins from the comment above. Is this expected or should I have the 54 from the spreadsheet? I haven't had time to check the code to see what is going on.
@remia: As of https://github.com/AcademySoftwareFoundation/OpenColorIO/commit/6585f18c889d68f236548ae536811a0b27571458, I can only see the 31 builtins https://github.com/AcademySoftwareFoundation/OpenColorIO/pull/1983#issuecomment-2295180317. Is this expected or should I have the 54 from the spreadsheet? I haven't had time to check the code to see what is going on.
Yes this is correct, multiple CTLs can share the same Builtin transform if only the display encoding changes. I added a count of unique values below the tables in the spreadsheet, it indeed sum to 31.
Thanks, I just checked too, I have an issue in the generator, I need to look at what is going on there instead.
Ok, I found out, the BuiltinTransform names are incorrect in the spreadsheet:
Casing issue in red.
Good catch, I fixed that now, hopefully should work!
Yes it works, All the builtins were found! I need to add the displays on the config spreadsheet and I should be good.
Just pulled the branch and I get a new faiilure that I didn't previously ...
[ctest] ERROR: test_style (FixedFunctionTransformTest.FixedFunctionTransformTest.test_style) [ctest] Test the setStyle() and getStyle() methods. [ctest] ---------------------------------------------------------------------- [ctest] Traceback (most recent call last): [ctest] File "C:\Users\kwheatle\source\OpenColorIO\tests\python\FixedFunctionTransformTest.py", line 64, in test_style [ctest] self.tr.setStyle(style) [ctest] PyOpenColorIO.PyOpenColorIO.Exception: 16known FixedFunction transform style:
I did some profiling on the CPU path and GPU timings and didn't find regression since earlier results.
Hi Remi, do you have any additional commits planned? When you're done, please go ahead and merge it. :)
Good to merge, I updated the branch against main, feel free to do it when the tests are completed!
Fantastic job Remi!