glTF
glTF copied to clipboard
KHR_displaymapping_pq
AKA - "The calibration transfer function"
The purpose of this extension is to enable consistent material representation under highly varying lighting conditions while at the same time retaining color representation (hue) - in usecases that mix models and light setup from multiple glTF sources.
It is assumed that the light contribution in any given glTF scene can vary from low values to very high values. These, internal, values must then be matched to a range that is acceptable by displays. This is usually in the [0.0 - 1.0] range at a precision of 8 bits, with HDR displays increasing the precision to 10 or 12 bits.
This extension is compatible with both HDR and SDR displays. The intended usecases for this extension is any usecase where the light contribution values will go above 1.0, for instance by using KHR_lights_punctual, KHR_emissive_strength or KHR_environment_lights.
It can be seen as an HDR enabling extension - but this is not the only intended usecase.
It defines an increased output range - which is generalized in the glTF spec as: 1.0 equals a fully exposed pixel
This will now become 10000 equals a fully exposed pixel reducing or removing the need for further mapping of values to output range.
Making this extension suitable to be used as a calibration transfer function ( or "tone-mapping") extension with the purpose of retaining color values.
Please note that it is possible to add a step with 'artistic intent' mapping before or after applying calibration.
The choice of perceptual quantizer is to make this extension future proof and compatible with the existing HDR standards. Hdr monitors are already on the market and support will only grow, for instance fully embraced by Windows 10.
Sample Viewer implementation - drag and drop you models as usual. Go into "Advanced controls" and enable "Force Displaymapping PQ" The "Compensatory light intensity" is one directional light with the intensity in LUX (lumen / m2) as defined by KHR_lights_punctual.
http://gltf.ux3d.io/
Babylon playground 'hack' - note that this is not fully implemented - light values must be kept within 0 - 10 000 lumen / m2, don't mix with IBLs as they will look totally wrong.
https://playground.babylonjs.com/#AHVICB#12
Several game engines using PQ a similar way, for instance the Frostbite engine by EA. https://www.youtube.com/watch?v=7z_EIjNG0pQ&list=PL3Bn4v5NMqSsbgK4Crj9YBzBSmeiTAGNT&index=1
In the Call Of Duty engine https://www.youtube.com/watch?v=EN1Uk6vJqRw
In the Lumberyard game engine: https://www.youtube.com/watch?v=LQlJGUcDYy4&t=1488s
Destiny 2 engine: https://www.youtube.com/watch?v=9jvhM8B63ng
I see a lot of value in automatic exposure control in order to avoid over bright areas or underbought areas. Overbright areas tend to have color shift because of the per-channel per-pixel tone mapping operators in use.
I think that we should expect viewers to just use the proper color space for their displays. I am not sure we should put into a 3D file what its expect transfer function to that display is.
I guess we could put exposures into individual glTF files but that feels a little weird. What happens if you open up multiple glTF files that have conflicting exposures or tone mapping parameters?
My recommendation may be that this is a viewer certification issue and we come up with guidelines for handling automatic exposure, tone mapping and transfer functions. Tone mapping and automatic exposure controls and proper transfer functions should be part of viewers, rather than part of the individual glTF files.
I would love to see that happen!
Because to put it into a glTF file we are sort of assuming that viewers are going to be using incorrect, less than optimal OOTF, unless you use this extension. That is a bit weird.
To solve the questions of how to handle multiple loaded glTFs with or without this extension:
- Remove the option of specifying rangeExtension and gamma for the OOTF - they will always be the default values.
- The PQ (HDR or SDR) is enabled for rendering in a model that includes this extension in the root element
- Move the sceneAperture setting to the scene object - this makes it possible to set sceneAperture control on a scene level. If no sceneAperture is explicitly set then no adjustment is done to light contribution before the PQ step. This also makes it natural to assume that any model that is brought into this scene will be displayed using the same sceneAperture values (as the new model is inserted into an existing scene)
Note that PQ is used in game engines as the transfer function for HDR output (as dictated by HDR10 and Dolby Vision), it is not used as the "tone mapping" curve (that bit is done by the "S curve" in the Lumberyard diagram for instance). PQ is also sometimes used as a log encoding for 3D LUT lookup.
This is insanity. Surely other people have pointed out how this is an untenable mess?
Hi @sobotka and thanks for taking part in this discussion.
Before we go any further I would like to urge you to be respectful and provide constructive comments. Please have a look at the code of conduct: https://www.khronos.org/developers/code-of-conduct
Best regards /Richard
Hi @romainguy and thanks for your comments
Note that PQ is used in game engines as the transfer function for HDR output (as dictated by HDR10 and Dolby Vision), it is not used as the "tone mapping" curve (that bit is done by the "S curve" in the Lumberyard diagram for instance). PQ is also sometimes used as a log encoding for 3D LUT lookup.
Sure - at the moment this extension does not supply such a curve. A simplified version can be done by providing parameters to the OOTF - however in order to avoid problems if multiple glTFs are loaded, I have excluded that support. Engines will still have a reasonable way of exposing an API that lets you have some control over this, by exposing the rangeExtension and gamma parameters. However this is not declared by this extension.
Hi @romainguy and thanks for your comments
Note that PQ is used in game engines as the transfer function for HDR output (as dictated by HDR10 and Dolby Vision), it is not used as the "tone mapping" curve (that bit is done by the "S curve" in the Lumberyard diagram for instance). PQ is also sometimes used as a log encoding for 3D LUT lookup.
Sure - at the moment this extension does not supply such a curve. A simplified version can be done by providing parameters to the OOTF - however in order to avoid problems if multiple glTFs are loaded, I have excluded that support. Engines will still have a reasonable way of exposing an API that lets you have some control over this, by exposing the rangeExtension and gamma parameters. However this is not declared by this extension.
It's still misleading to point to engines supporting PQ because they are not using it in the way you are defining here at all. They use PQ because that's the transfer function of Dolby Vision/HDR10 (some also use PQ as a log encoding function for lookup tables but that's unrelated).
Honestly I think at this stage a better solution to this problem is something I was discussing with @elalish: an extension that precisely define an exposure scalar factor, and a 3D LUT to be applied by the renderers to go from D65-Rec.709-Linear ("scene light") to D65-Rec.709-sRGB ("display light"), at least for SDR.
This way, content creators are free to tone map/color grade/gamut map/etc. as they see fit. This means you could use the OOTF, and others could use their preferred tone mapping solution. In addition, a 3D LUT would allow engines to support full color grading pipeline and a better interop with other tools (Photoshop, Da Vinci Resolve, etc.).
It's still misleading to point to engines supporting PQ because they are not using it in the way you are defining here at all. They use PQ because that's the transfer function of Dolby Vision/HDR10 (some also use PQ as a log encoding function for lookup tables but that's unrelated).
Well, I disagree. I don't believe that "they are not using it in the way you are defining here at all" - it's used as a transfer function in this extension and in my opinion there are many similarities. Afterall, I started with this extension after investigating how HDR support was handled by the gaming community.
Honestly I think at this stage a better solution to this problem is something I was discussing with @elalish: an extension that precisely define an exposure scalar factor, and a 3D LUT to be applied by the renderers to go from D65-Rec.709-Linear ("scene light") to D65-Rec.709-sRGB ("display light"), at least for SDR.
In what sense and from what perspective would that be a "better solution"? What would be solved by such an extension that would not be solved by this?
I don't believe that "they are not using it in the way you are defining here at all" - it's used as a transfer function in this extension and in my opinion there are many similarities.
All of those games use “tone” curves; some form of open domain tristimulus to closed domain “compression”. Full stop. Speak with any of the pipeline developers to confirm this. If you would like, I am sure someone in this thread has contacts at those houses.
It is important to separate the open domain tristimulus values, which extend from zero to infinity, from the encoded down-the-wire signal passing of the closed domain, zero to 100% tristimulus encoding of a fully formed image.
That is, depending on the nature of the output medium, the input tristimulus must be compressed to it. This is a non-trivial act of image formation, and must consider the output medium’s capabilities^1.
It should also be noted some foundational problems with assumptions. Specifically, it may surprise folks that even if a medium can fully represent the stimulus in terms of colourimetric definitions, it has been researched and shown that 1:1 representations are deemed unacceptable^2. And this is assuming our display technology could even remotely achieve this level of replication fidelity, which they are completely incapable of. Again, even if possible, the idea that a replication would suffice is a myth.
You should note that in ITU-R BT.2390 their OOTF uses a demonstration of previously graded content. Meaning the incoming signal is already a fully formed image.

Bottom line, it is absolutely critical to form the image from the open domain tristimulus data prior to distorting a signal for down the wire delivery, or final distortions for something that complies with the previously formed down-the-wire signal for further compression. See EETF in ITU-R BT.2390 as an example.
- This action also spans chromaticity footprint issues, as well as intensity. Again, the depth and breadth of what this act is is massive, and no approach that assumes “down the wire” encoding constraints will suffice.
- Quality of Color Reproduction, David MacAdam, Kodak. And many others.
Well, I disagree. I don't believe that "they are not using it in the way you are defining here at all" - it's used as a transfer function in this extension and in my opinion there are many similarities. Afterall, I started with this extension after investigating how HDR support was handled by the gaming community.
Yes, the PQ transfer function is used by those game engines when outputting to HDR formats that call for a PQ transfer function. The PQ transfer function is not needed for SDR display (but you still need a transfer function, most likely sRGB's), and those engines don't make use of the OOTF. It's definitely valid to use PQ to output to HDR though, I'm not contesting that.
In what sense and from what perspective would that be a "better solution"? What would be solved by such an extension that would not be solved by this?
It would be a better solution because it would give content creators full control over image formation, including color grading and tone mapping. This would allow you for instance to use the PQ OOTF as your "contrast" curve, and others to use something different that suits their needs.
All of those games use “tone” curves; some form of open domain tristimulus to closed domain “compression”. Full stop. Speak with any of the pipeline developers to confirm this. If you would like, I am sure someone in this thread has contacts at those houses.
Hi @sobotka and thanks for your comment Please note that the purpose of this extension is not to provide color grading (tonmapping curve), LUT or any type of artistsic intent. That is the expected behavior, full stop.
Of course games add artistic intent - that can be added to glTF as another extension - but it is not included in this proposal.
It would be a better solution because it would give content creators full control over image formation, including color grading and tone mapping.
That is not the purpose of this extension - the purpose of this extension is to get deterministic output, under varying lighting conditions, to HDR or SDR display. Loading two glTF's with different 'artistic intent' curves will yield unwanted result as one of those models is sure to be displayed in a way that is not intended.
So, from that perspective it is better to do the extension this way.
Artistic intent can be added as another extension.
The purpose of this extension is to enable consistent material representation under highly varying lighting conditions.
Hi @sobotka and thanks for your comment Please note that the purpose of this extension is not to provide color grading (tonmapping curve), LUT or any type of artistsic intent.
These two statements cannot exist together. They are in a tautological opposition.
There is no need to redefine display colourimetry for BT.2100; it’s already done.
It would be unclear what design problem this attempts to rectify.
It is assumed that the light contribution in any given scene can vary from low values, maybe in the 100 lumen range - to values in the 100 000 lumen range. These, internal, values must then be matched to a range that is acceptable by displays. This is usually in the 0 - 1 range.
Scaling the BT.2100 range to 100 nits would suggest that the range you are specifying is from casual desktop display to a mere 9.11 or so EV of range.
This is a perfectly suitable display output range, but unsuitable for the range of values in a scene.
It is confusing what this attempts to do.
That is not the purpose of this extension - the purpose of this extension is to get deterministic output, under varying lighting conditions, to HDR or SDR display. Loading two glTF's with different 'artistic intent' curves will yield unwanted result as one of those models is sure to be displayed in a way that is not intended.
So, from that perspective it is better to do the extension this way.
Artistic intent can be added as another extension.
I'm all for allowing the use of HDR displays but I don't think it belongs to an extension. Otherwise, an asset could technically not be displayed in HDR on an HDR-capable display if it didn't specify the extension. Not only that, but it means that an asset using this extension would mandate a PQ encoding output, which is incompatible with HLG for instance (you could always undo the PQ encoding and re-encode with the HLG curve). To do what you are intending to do we need two things:
- Exposure, which you have under
sceneAperture
(but again I strongly disagree with this term) - A compression curve of some kind which you could parameterize to hit a target output luminance (as long as you define the scale, and here you can get informed by what PQ does)
I would also add that you've already baked artistic intents in your proposal: "Without the reference OOTF the appearance is somewhat whitewashed"
These two statements cannot exist together. They are in a tautological opposition.
There is no need to redefine display colourimetry for BT.2100; it’s already done.
It would be unclear what design problem this attempts to rectify.
And yet, this is exactly what this extension sets out to provide a solution for. I would encourage you to make a best effort to try and understand the purpose of this extension - without it I fear we will be perpetually stuck, argumenting from different vantage points for solutions that sets out to solve different things.
As for the design problem this extension aims to solve:
In glTF, 1.0 is considered to be a fully exposed pixel. This leads to unwanted effects when the light contribution (intensity) is above 1.0, resulting in complete white-out at very low brightness levels.
Here is an example of a scene with macbeth colored test spheres (from 3D certification process) and a directional light with low brightness (50 lumen / m2). Rendered according to spec:
The hue shift (white-out) happens because the process of mapping values in a higher range to that of the display is not specified, values are simply clamped. Clearly not PBR and not a wanted result, assuming you are looking for an image that conveys the material in a realistic manner.
Using this extension, and a directional light with intensity 10 000 the result will be:
sceneAperture can be setup to up or downscale light values, meaning you could easily have a scene where your directional light is 100 000 instead of 10 000
I'm all for allowing the use of HDR displays but I don't think it belongs to an extension. Otherwise, an asset could technically not be displayed in HDR on an HDR-capable display if it didn't specify the extension. Not only that, but it means that an asset using this extension would mandate a PQ encoding output, which is incompatible with HLG for instance (you could always undo the PQ encoding and re-encode with the HLG curve).
I doubt this extension would prohibit display of glTFs on HDR capable displays, simply because the spec does not say anything about output. If there is interest in HLG I can certainly look into updating the spec to be able to use framebuffer with that colorspace. That can also be added at a later date. HDR10 (PQ) is chosen because it is the most widely adopted HDR format.
Using this extension, and a directional light with intensity 10 000 the result will be:
And again, what happens when a single channel escapes the display range?
Clearly not PBR and not a wanted result, assuming you are looking for an image that conveys the material in a realistic manner.
You desperately need to revisit basic concepts.
And yet, this is exactly what this extension sets out to provide a solution for. I would encourage you to make a best effort to try and understand the purpose of this extension - without it I fear we will be perpetually stuck, argumenting from different vantage points for solutions that sets out to solve different things.
As for the design problem this extension aims to solve:
In glTF, 1.0 is considered to be a fully exposed pixel. This leads to unwanted effects when the light contribution (intensity) is above 1.0, resulting in complete white-out at very low brightness levels.
Here is an example of a scene with macbeth colored test spheres (from 3D certification process) and a directional light with low brightness (50 lumen / m2). Rendered according to spec:
The hue shift (white-out) happens because the process of mapping values in a higher range to that of the display is not specified, values are simply clamped. Clearly not PBR and not a wanted result, assuming you are looking for an image that conveys the material in a realistic manner.
Using this extension, and a directional light with intensity 10 000 the result will be:
sceneAperture can be setup to up or downscale light values, meaning you could easily have a scene where your directional light is 100 000 instead of 10 000
Those images are a good example of the problem that glTF needs to solve, but there are two independent steps that need to be taken:
- Image formation (this includes exposure, tone mapping/range compression, etc.)
- Display mapping
The reason I'm arguing against this extension is that it does both. If we want an extension to output to HDR capable displays using PQ, then this extension could be that, but it should only define the OETF, output color space, etc. Exposure/scene light scaling should be part of an image formation extension along with range compression (which does not necessarily has to be artistic).
@sobotka
And again, what happens when a single channel escapes the display range?
In what scenario may that be? Are you talking about any hypothetical glTF?
You desperately need to revisit basic concepts.
Please refrain from such broad generalizations, make sure to provide constructive feedback.
Here is a link to a page that describes how to write constructive code review comments, please have a look before we continue this discussion: https://www.michaelagreiler.com/respectful-constructive-code-review-feedback/
You desperately need to revisit basic concepts.
Please refrain from such broad generalizations, make sure to provide constructive feedback.
What part of the suggestion of revisiting basic concepts is subject to tone policing?
The idea of revisiting basic concepts of colourimetry and image formation here are missing, as well as conflating EOTFs with those concepts.
BTW I probably won't endorse this extension as being an official Kkronos extension. It is I believe much of it is just best practices for viewer implementations.
The exposure/aperture and maximum scene illumination isn't really something I think should be standardized in the fashion described in this extension. I would support a camera-specific exposure setting, but not in the fashion proposed/advocated in this current draft.
Hi Ben and thanks for your comment
The exposure/aperture and maximum scene illumination isn't really something I think should be standardized in the fashion described in this extension. I would support a camera-specific exposure setting, but not in the fashion proposed/advocated in this current draft.
Why do you feel that it should not be standardized in this fashion - what would you change and how? This extension explicitly sets out NOT to be a physical camera.
Why do you feel that it should not be standardized in this fashion - what would you change and how? This extension explicitly sets out NOT to be a physical camera.
Physical cameras, which I've implemented a few times, have a few different aspects to them.
One area that is very useful in VFX is when you need match existing shots in terms of aperture, film size, focal length, etc. This helps define the DoF CoC as well. For the real-time world where the viewer's dimensions are often not fixed, these requirements do not appear often. So I think skipping that part of VFX physical cameras is fine for now.
Another part of physical cameras is their exposure controls. I think this is very useful. I think that we should have an exposure control on the camera and it should be adjustable via animations and other factors. I think we should be also able to set it to auto-mode, probably some smart sub-set of the features offered in Unreal Engine -- those are just amazing:
https://docs.unrealengine.com/4.27/en-US/RenderingAndGraphics/PostProcessEffects/AutomaticExposure/
I think that this proposal's approach isn't reflective of accepted best practices when it comes to exposure controls. Rather it is using "aperture" instead and it is not tying it to the camera (which leads to conflicts on which data to use when) and it is also dealing with these weird lighting maximums in the scene. This is a solution, but I don't think it is the best one. I also think that if we conform to the current best practices in the industry it will be easier to support whatever we decide to release because the content will exist in roughly that form.
Also this extension deals with HDR content and various transforms in that content pipeline which again I view are just best practices and a little bit orthogonal, or at least separable, from the auto-exposure controls.
Thus I would separate this out into KHR_camera_exposure extension that only does exposure and auto-exposure controls (do a survey of what is supported across the various main tools) and then see if there is anything else to do... I suspect there isn't anything else needed except just some discussion of best practices in terms of implementing color workflows.
I go back to the original motivation for this extension as explained in an email thread. You had hue shifts occurring in various images as a result of a tone mapping operator as you increased the illumination in the scene that wasn't balanced by a corresponding exposure reduction. This is the core problem. I've run into it myself many times before. It can be fixed by simply either adjusting manually the exposure down, or by standardizing an auto-exposure adjustment setting. This is the simplest solution and also an incredibly standard solution to the problem. I advocate doing that. It is how we solved it ourselves (although we've come up with some complex solutions where we can adjust exposure on a per material basis, but that type of complexity can come later.)
Another part of physical cameras is their exposure controls. I think this is very useful. I think that we should have an exposure control on the camera and it should be adjustable via animations and other factors.
And this would be in a different extension, ie one that sets out to model a physical camera. This is not that extension.
I think that this proposal's approach isn't reflective of accepted best practices when it comes to exposure controls.
Exactly - it does so on purpose sincec it does not set out to model a physical camera.
I go back to the original motivation for this extension as explained in an email thread. You had hue shifts occurring in various images as a result of a tone mapping operator as you increased the illumination in the scene that wasn't balanced by a corresponding exposure reduction. This is the core problem. I've run into it myself many times before. It can be fixed by simply either adjusting manually the exposure down, or by standardizing an auto-exposure adjustment setting.
One major issue is that glTF defines 1.0 as a fully exposed pixel. This means that a directional light with 1 lumen/m2 intensity will result in full bright illumination - not PBR at all. Another issue is that the glTF camera does not have any exposure controls at all. All it has is a transform and view frustum. It really just is an observer. The glTF spec does not even touch upon how scene linear values end up to their target - which could be anything from a 3D printer to a printed photo.
This extension solves these problems without adding a physical camera or post-processing.
I don't see that having this as best practice for viewers will work - there will surely be situations where you want to do your displaymapping differently - even in a 3D commerce usecase. In that case just drop this extension from the asset and go back to default behavior or use some other extension.
One major issue is that glTF defines 1.0 as a fully exposed pixel. This means that a directional light with 1 lumen/m2 intensity will result in full bright illumination - not PBR at all.
All major glTF viewers implement exposure controls (usually fixed, arbitrary right now) with tone mapping usually using the VFX ACES standard and then map to the screen using sRGB color space transforms. Thus your above statement isn't the practical reality.
Right now because the web is limited to LDR content, yes the brightest a pixel can be on screen after exposure adjustment + the tone mapping + sRGB conversion is 1.0. But that is because that is a current output limitation. Once you can output HDR content from a WebGL/WebGPU context, then this pipeline will change according to the capabilities of the hardware, and glTF doesn't really have to have any extensions to enforce that.
(Also once P3 color spaces are supported, we can do a P3 color space transform rather than an sRGB color space transform on those devices.)
The only thing I feel that should be added to the glTF is exposure controls, and explicitly auto-exposure. Maybe the tone mapping operator as well -- there can be artistic choice here, but I am not sure it matters as the ACES is good enough for I think +90% of cases. But the rest of the pipeline after tone mapping to display output has a well defined physically correct answer that shouldn't require any artistic choices -- it is either correct or it is incorrect for the display capabilities. Having to specify an extension here makes no sense.
The auto-exposure controls can be such that they adapt differently to an HDR display versus an LDR display. To me this area is very interesting -- how to formulate auto-exposure controls in a fashion that takes into account different output ranges that content can be experienced on. What does it mean for content to be properly exposed on an HDR display versus an LDR display?
I think it isn't necessary to convince me of the value of this extension as it is. I've invested time into presenting this feedback. You can choose to receive it or dismiss it. And we can move on.
All major glTF viewers implement exposure controls (usually fixed, arbitrary right now) with tone mapping usually using the VFX ACES standard and then map to the screen using sRGB color space transforms. Thus your above statement isn't the practical reality.
I would just caveat this by saying many (most?) implementation use an approximation of the ACES standard. The real implementation does have benefit over those approximations but I've been steeped enough into this topic lately that I would recommend against using that standard (for several reasons that I'd be happy to explain if you are interested @bhouston).
All major glTF viewers implement exposure controls (usually fixed, arbitrary right now) with tone mapping usually using the VFX ACES standard and then map to the screen using sRGB color space transforms. Thus your above statement isn't the practical reality.
Sure, many glTF viewers may implement some sort of exposure and tone-mapping - however as this is not part of the glTF standard there is no way of telling what's right and what's wrong. The practical reality is that the output process is not normative. This extension seeks to remedy this.
With regards to ACES standard - I have made an active choice not to use any type of filmic or artistic-intent operators. The goal is to get the most neutral (or raw) output to a display.
The solution I have presented works with both physical camera and post-processing:
Above - overview of how implementations may choose to implement the extension
Above - design overview - extension is compatible with implementations that wish to have physical camera (exposure) and/or post-processing effects.
Looking over this PR, I can see a number of conversations "marked as resolved" in the GitHub sense, where the conversation itself doesn't appear to end with both parties reaching an agreement. For example the "Scene Aperture" terminology was called out as possibly not being a good fit for the use here, but the parties debating this did not converge. Several other examples are visible by expanding the previously resolved conversations here.
Also, this extension appears to have no parameters. Is this really a one-size-fits-all solution? What happens in a scene with a mix of glTF models, where some have this extension and some do not?
Only a few select extensions change material behavior by their mere presence, which is a practice that the PBR TSG is working to move away from. KHR_materials_unlit is one counterexample grandfathered in, where the presence of the extension on a material changes the whole material to a shadeless mode. For extensions without any parameters, this actually works OK when ported to an application-specific API, because the whole glTF extension can be represented by a single Boolean in the API, or for node-based systems a completely separate node graph can be used. But that doesn't appear to be the case with PQ here: This extension is placed at the root of the model, and its mere presence changes what, the whole output display mapping strategy for the scene?
This extension doesn't seem configurable, and doesn't appear to be well targeted at the thing it changes, and substantially changes behavior by the mere presence of an empty glTF extension container block. Also it seems like we're lacking a lot of convergence here.