KHR_meshopt_compression proposal
This PR proposes a new extension, KHR_meshopt_compression, which is a successor to the existing extension EXT_meshopt_compression.
Motivation
KHR_meshopt_compression provides general functionality to compress bufferViews. This compression is tailored to the common types of data seen in glTF buffers, but it's specified independently and is transparent to the rest of the loading process - implementations only need to decompress compressed bufferViews and the accessors behave as usual after that. The compression is designed to make it possible for optimized implementations to reach decompression throughput of multiple gigabytes per second on modern desktop hardware (through native or WebAssembly code), to ensure that the compression is not a bottleneck even when the transmission throughput is high.
As a result, it's possible to compress mesh vertex and index data (including morph targets), point clouds, animation keyframes, instancing transforms, Gaussian Splat data and other types of binary data. Compression can be lossless, taking advantage of preprocessing to optimally order and optionally pre-quantize the data, or lossy with the use of filters, which allows arbitrary degree of tradeoff between transmission size and quality for multiple types of common glTF data.
The compression process is versatile and different data processing tools may make different tradeoffs with how to preprocess the data before compression, whether to use lossy compression and of what kind, etc. - by comparison, the decompression process is straightforward and fast. This is by design, and means that it's comparatively easier to implement this extension in renderers compared to processing pipelines, which removes as many barriers to adoption as possible.
Compared to EXT_meshopt_compression, this extension uses an improved method of compressing attribute data, which delivers improved compression ratios at the same decompression speeds for a variety of different use cases, and incorporates special support for lossy color encoding which allows to reduce the size further for cases where color streams are a significant fraction of the asset (some 3D scanned meshes, point clouds, Gaussian splats). All existing use cases of EXT_meshopt_compression are supported well, and no significant performance compromises are being made - as such, all existing users of EXT_meshopt_compression should be able to upgrade (pending ecosystem adoption) to KHR_meshopt_compression.
Why a new extension?
EXT_meshopt_compression has been completed ~4 years ago; it serves as a good and versatile compression scheme that transparently supports geometry, animation, instancing data and allows to maintain maximum rendering efficiency and in-memory size while using additional compression during transfer, with decompression throughput in gigabytes/second on commodity hardware.
Since then, the underlying compression implementation for attributes in meshoptimizer has been revised to version 1 (from version 0 that's used in the EXT extension) for better compression; this is currently used outside of glTF ecosystem by some native and web applications. Additionally, some use cases like point clouds and 3D scans have emerged after the extension was initially standardized, that benefit from better color compression (which has been considered for EXT but not included since at the time it was focused more on "traditional" 3D geometry).
Because of the change in the underlying compression bitstream, the bitstream specification needs to be revised as implementations of EXT_meshopt_compression may not be able to decode the new format - thus, a new extension name is necessary.
What changed?
KHR_meshopt_compression uses the same JSON structure as EXT_ extension, keeps the same three filters and keeps two existing schemes for index compression as is. It upgrades attribute compression to use a more versatile encoding (v1), which supports enhanced bit specification for deltas and customizable per-channel delta modes that improve compression further for 16-bit as well as some types of 32-bit data. For compatibility, v0 encoding is still supported. It also adds one new filter, COLOR, which applies additional lossy transform to quantized RGBA color data to decorrelate input channels (similarly to OCTAHEDRAL encoding for vectors, this improves compression and provides more optionality wrt using variable number of bits, without any changes needed to the renderer as the filter unpacks data back into quantized normalized RGBA).
On typical geometry data, enhanced attribute compression provides approximately 10% geomean reduction in vertex data size; the gains obviously highly depend on the content. On point clouds, as well as some 3D scanned models that use vertex colors, the new color filter together with the new attribute compression results in ~15% geomean reduction, with even stronger (20-25%) gains when non-aligned bit counts are used (e.g. 6-bit or 10-bit colors).
Why KHR?
These improvements require a new extension, as they update the data format / bitstream as well as the JSON enum for color filter. While it's possible to specify this as a new EXT extension, like EXT_meshopt_compression2, it seemed better to promote to KHR:
- This matches existing compression formats, like Draco and Basis Universal, as well as some formats pending discussion like SPZ
- This makes the compression scheme, which has proven to be versatile and useful since it got standardized, more first class, with associated ecosystem benefits in the coming years (more support)
Since a lot of different parts of glTF ecosystem can be supported by meshopt compression, including core functionality and extensions like Gaussian splats or mesh instancing, specifying a KHR version provides a more comprehensive/coherent story wrt compression for glTF ecosystem.
The minimal "upgrade" path from EXT to KHR would involve just changing the extension name, as the original bitstream should be fully compatible with the new bitstream. The ideal upgrade path would involve re-encoding the original data (helpful if COLOR filter is useful), or at least losslessly re-encoding the attribute data from v0 to v1 - this doesn't require parsing any accessor data, and merely requires decompressing buffer views and re-compressing them with new encoding.
Implementations
Since this is a proposal that just got created, this extension obviously does not have implementations yet :) Having said that, because the JSON structure is exactly the same, except the addition of a COLOR filter, and most implementations of EXT_meshopt_compression use meshoptimizer library which supports the new additions (color filter was released in 0.25), I'd expect that existing implementations for EXT_ can be made compatible with KHR_ with minimal effort:
- For loaders that currently implement support for
EXT_(three.js, Babylon.JS), updating meshoptimizer module and tweaking the JSON parsing code to recognizeKHR_extension should be only a few lines of changes and should be sufficient (for reference, full three.js implementation of EXT variant); - For data processors that currently implement support for
EXT_(gltfpack, glTF-Transform), updating meshoptimizer module and exposing a user option that would serializeKHR_extension and encode attribute data using new attribute encoding, plus optionally support color filter, should be easy - For any loader that wants to implement this without relying on meshoptimizer library for some reason, I've updated the reference decoder by following the updates made to this specification, so it should be comprehensive and while it's more changes than you'd need if you were just using the library, it's a manageable amount of extra complexity.
it seemed better to promote to KHR
KHR vs EXT prefix choice generally means different treatment of the extension's IP besides perceived "ecosystem support". In particular the following details should be clarified upfront (not a legal advice, though):
- Khronos would be the extension text copyright holder
- The extension should not include 3rd-party trademarks or their use should be explicitly allowed (is "meshopt" registered in some way?)
- Any technology needed by the extension may become included in the Khronos IP framework (with some caveats)
Understood. I'm not a lawyer, but none of these seem like blockers to me (pending closer review). meshopt is not a registered trademark.
Some updates!
- There should not be any copyright/trademark/IP issues as far as I'm concerned.
- meshoptimizer 0.25 released last week includes support for color filter and v1 vertex codec in JS module, as well as an update to the reference decoder that supports both as well.
- While no tools exist that can produce files with this extension, and I intentionally deferred official support in gltfpack for it, I have an implementation of encoding that requires just a few small tweaks to the source code (to use the correct extension name), so it should be very easy to produce test assets if necessary
- I took a stab at seeing what it takes to upgrade an implementation of EXT_meshopt_compression to KHR; since the heavy-lifting is done by meshoptimizer library, the actual change should only require a few lines of JSON parsing changes. See https://github.com/mrdoob/three.js/compare/dev...zeux:three.js:khr-meshopt - the first commit there is just updating the meshopt_decoder to latest version.
It would be great to understand what the next steps could be here, as from my perspective all blockers have been resolved. Quite confident the actual implementations are going to be quick to finalize if there is agreement from Khronos in principle that the extension should be included. I'm obviously also happy to adjust the proposed text if corrections or clarifications are necessary.
The three.js update suggests that the decoder does not need to know whether the glTF asset uses the original EXT or the new KHR extension name. This implies that existing EXT (v0) files could be upgraded to KHR (v1) without re-encoding anything at all.
Since v0 support is not going anywhere (because the EXT extension has been ratified and tools are expected to accept such files indefinitely), I'd suggest allowing v0 in KHR as well (with a note about v1 benefits), given that the intention is to keep using the same extension name.
Since v0 support is not going anywhere (because the EXT extension has been ratified and tools are expected to accept such files indefinitely), I'd suggest allowing v0 in KHR as well (with a note about v1 benefits), given that the intention is to keep using the same extension name.
That sounds good to me. I originally specified KHR as just accepting v1 in order to have the smallest/simplest possible specification. But indeed, this restriction is not strictly necessary - v0 support is never going away, and the format version is encoded in the first byte of the stream so the meshoptimizer library can decode both. The reference decoder decodes both as well. I can change this, it would just require a more complicated bitstream description where specific additions are called out as only being there if version is 1.
Separately: Should the filter decoding be bit exact? As far as I can tell, there's parts of glTF specification that do not require bit exact decoding - for example, normalized values mentioned above use a / 2^N-1 formulas or equivalent, but hardware decoding is unlikely to use a full division, and is more likely to use a multiply-by-reciprocal - these will produce different values. As another example, anywhere where glTF specifies matrix math, in practice a lot of implementations will end up using fma to contract parts of the equations, again resulting in different behaviors. I haven't checked Basis specification but I will be very surprised if the transcoding to BCn blocks is specified to be bit exact as well, because some parts of it require a full block reencoding if memory serves?
update seems resolved per comments above, we want the specification to be precise even if real implementations are not always precise wrt floating point math.
Worth noting is that the filters defined here always imply some amount of lossiness on encoding, so minor differences should not be consequential in practice. I also don't think in practice right now there's specific implementation issues with portability of filter decoding, it's just that this is an area where I'm used to GPU API specifications allowing some amount of tolerance to permit multiple possible implementations, and not require IEEE-754 exactness in all places.
Updated the spec as suggested, including adding v0 back to the extension text. The bitstream reference now specifies cases where aspects of the encoding depend on the version (different tail encoding, channel modes for multi-byte deltas, control bits for more flexible bit packing including 1-bit deltas). With this change, any asset with EXT_meshopt_compression can be upgraded to KHR_meshopt_compression by simply changing the extension name - so KHR extension becomes a full superset of the EXT extension.
I've drafted optional support for this in gltfpack as well (making it trivial to produce test assets that use the full set of features, not just a renamed EXT -> KHR extension), linked above.
I don't really have any outstanding spec changes that are known to be necessary. Please let me know what the next steps here are; I believe what happened with the EXT version years ago is it was submitted as a draft, which then made it easier to contribute support to a few projects within the ecosystem. I can contribute support for this to cgltf, three.js and Babylon.JS - per notes above it's quite straightforward.
Here's a couple test assets converted with the PR above; they aren't necessarily super representative, but this makes it easy to test renderer support.
| Asset | Size | Attribute size | Source |
|---|---|---|---|
| brainstem_ext.glb | 370464 | 258176 | glTF-Sample-Assets/BrainStem |
| brainstem_khr.glb | 345856 (-6.6%) | 233575 (-9.6%) | -- |
| flower_ext.glb | 2840840 | 2839091 | https://github.com/KhronosGroup/glTF-Sample-Assets/issues/31 |
| flower_khr.glb | 2579496 (-9.2%) | 2577738 (-9.2%) | -- |
We now have a few outstanding draft implementation PRs (gltfpack https://github.com/zeux/meshoptimizer/pull/966, glTF-Transform https://github.com/donmccurdy/glTF-Transform/pull/1745, three.js https://github.com/mrdoob/three.js/pull/32163). For some of them to get merged, it would be great if the status of this extension was more clear. As far as I know the extension text is in a good shape but of course I'd be happy to address any further feedback.
Can someone outline the next steps here? For example, can this PR get merged with the extension marked as a draft? Or can the list of extensions be updated to officially mark this as "Review Draft", linking to the PR? Or some other variant of this would be great. Essentially I'm looking for some signal along the lines of "this extension looks reasonable and we'll review this for the inclusion", so that dependent projects can feel safer in merging the implementation PRs with the assumption (but not a promise, of course) that the extension will be added in the future.
The 3D Formats WG discussed this proposal and generally agreed to move it forward. The main concerns are related to the extension lifecycle and its effect on EXT_meshopt_compression. In particular, this extension should provide sound (although non-normative) guidance both for artists and developers. For example (feel free to reword/expand):
- Tools that already support
EXT_meshopt_compressionshould keep supporting it to be able to read pre-existing assets. - DCC tools should give users a choice to use either variant, likely indefinitely. The default option should be eventually switched to the KHR variant.
- Existing assets that use the EXT variant can be losslessly converted to KHR, if needed, by changing the extension strings inside glTF JSON.
A higher-level question: is it safe to assume that there will be no v2 scheme any time soon?
@lexaknyazev Thanks!
I agree with this list of recommendations; they match my plans for meshoptimizer library (indefinite transparent support for decoding v0/v1 and indefinite support for encoding v0/v1 based on application's choice) as well as gltfpack (explicit support for choosing encoding version, initial default is EXT, eventual plan to switch to KHR by default). I will add these as a separate non-normative section.
If a DCC tool or processor doesn't support either extension today then I could see it only implementing KHR at some point in the future perhaps (e.g. if it only implements encoding in 2027 then it might not make sense to implement EXT export). I'll word the second point as "For maximum compatibility, ..." because today this recommendation should apply as written.
A v2 scheme should not happen any time soon. Between no further improvements being on my roadmap, the new design mostly filling preexisting gaps in the old design, and the design being constrained by (generic input data, very fast decoding, compressibility with Deflate-like schemes on top of the output) I don't see directions for the possible further evolution right now, and it's unclear if they emerge in the future.
@javagl
(I wonder whether there even are "two independent implementations" for the EXT version of this, so there don't seem to be concerns here either...)
The meshoptimizer library contains a separate JS-only decoder implementation (mostly for reference) in addition to the main C++ source. It should support both EXT and KHR variants.
I just recently dug a bit through the repo (while trying to get that EXT support into my libs), and have seen that "reference" implementation. I think it is also used internally as for the unit tests.
But this implementation is certainly not independent of the other. Both are still written by the same author.
The bitstream spec looks very detailed, but it's nearly impossible to say if someone could create a decoder implementation based on that, from scratch. You covered many questions that would come up. But ... even you can not be 100% sure that there's not something missing. (You're setting the bar higher than most people could imagine, but still...).
(All this, of course, does not affect the approval of this PR)
But this implementation is certainly not independent of the other. Both are still written by the same author.
That's not entirely correct; the original version was written from the (EXT) spec text by a different author who is credited in the reference implementation. The updates for KHR versions plus some minor refactoring from this year are mine. I would thus call the reference implementation "mostly independent". I don't know if this is very relevant in context of glTF (it's unclear to me what an "implementation" means; I've never seen an alternative implementation for Draco for example, and implementations in various loaders quite rightfully rely on the same underlying C++ code).
(it's unclear to me what an "implementation" means; I've never seen an alternative implementation for Draco for example, and implementations in various loaders quite rightfully rely on the same underlying C++ code).
It's unclear for me as well. And maybe I shouldn't have mentioned it, because it could open a "can of worms", with endless room for discussion that is not directly relevant for this PR, and certainly not specific for this PR.
But an attempt of a summary of my thoughts (which is pretty much in line with what you said): I recently brought up that exact question of what an "implementation" means. And it's hard to give a precise (and sensible) meaning to this term for (some) glTF extensions. And I also occasionally mentioned that 'challenge': "Write a draco decoder, from scratch, based on the specification". Sure, a trillion-dollar-company can throw some big $$$ on a bunch of nerds, and lock them in the basement for 3 months, with the simple task "Get it done!". But from a practical perspective, everybody has to use the (one and only) reference implementation. I think that a broader discussion about the (seemingly philosophical) question about the "two implementations" clause is warranted, but this will have to happen elsewhere.
I don't have much familiarity with it, but this appears to be another independent implementation (Rust) of the EXT compression spec: https://github.com/yzsolt/meshopt-rs
@javagl In the context of this extension, there are two quite independent pieces of "an implementation":
- Parsing JSON properties, understanding how to use meshopt-compressed data in the context of a glTF asset.
- Decoding the compressed bitstreams.
The first piece is engine-specific and therefore, for example, glTF-Transform, three.js, Babylon.js, and Khronos glTF-Sample-Viewer would be four separate implementations should they support the extension.
The second piece is engine-agnostic so different viewers can technically use the same decoding library. That said, the bitstream spec must be complete and unambiguous enough to enable anyone implement a decoder from scratch without referring to any source code besides spec-inlined snippets. Given that this bitstream spec has two implementations written in two different languages, I think they count as two.
The bitstream specification here looks very detailed. It looks like it should be possible to implement a decoder (and maybe even encoder?) from that. It looks like there actually are things that can count as "two implementations". So to not further pollute this PR with off-topic discussions, I moved that to https://github.com/KhronosGroup/glTF/issues/2542
@zeux We'd need sample assets covering all JSON properties of this extension to ensure that engines correctly pass them to decoders. In particular:
- An asset with a non-zero
byteOffset - An asset with a fallback buffer
- Assets covering all modes
- For the
attributesmode- All filters, including
NONEand undefined - Both v0 and v1 streams
- As a special case, a v0 stream used with the
colorfilter
- All filters, including
- For the
trianglesmode- Both
byteStrideoptions
- Both
- For the
indicesmode- Both
byteStrideoptions
- Both
- For the
It would also be great to eventually have sample streams exhaustively covering the bitstream spec to ensure that decoders handle it correctly. Note that exhaustive bitstream coverage is mostly a "nice-to-have" thing for now but it's required to ensure long-term sustainability (think potential inclusion of meshopt in the ISO version of glTF). Some of the test cases may be merged together when that would make sense. Logistically, they may be organized as glTF assets containing both compressed and uncompressed data and asserting that decompressing the compressed blocks yields uncompressed blocks exactly (when filters aren't used) or within a reasonable epsilon.
Here's a list of test cases based on my understanding of the spec.
- For the
attributesmode:- different byte stride values resulting in different
maxBlockElementsvalues, not sure if the full set of block sizes (256, 224, 192, 176, 160, 144, 128, 112, 96, 80, 64, 48, 32) needs to be covered; - a number of elements not divisible by 16;
- all control modes, all delta encoding modes, and all channel modes;
- for the octahedral filter:
- both
byteStrideoptions; - negative
xandyvalues; - boundary K values for each valid byte stride, additionally the case when K=8 and N=16;
- values that need clamping for negative hemisphere;
- fourth component with bit width exceeding K (to ensure that it's not clamped/masked);
- both
- for the quaternion filter:
- boundary K values (4 and 16);
- negative
x,y, andzvalues; - clamping of negative values during
wcalculation;
- for the exponential filter:
- positive and negative exponent values;
- positive and negative mantissa values;
- boundary exponent values;
- exactness of the decoding;
- for the color filter:
- both byte stride options;
- boundary K values for each valid byte stride, additionally the case when K=8 and N=16;
- valid
y,co, andcgvalues that cause the intermediate RGB values to not fit into the original N-bit representation.
- different byte stride values resulting in different
- For the
trianglesmode:- 32-bit values used with
byteStride: 2; - data relying on
nextandlastwraparounds; - data using all 16 elements of the FIFOs;
- data containing varint-7 values above 4294967295;
- data covering all
0xXYbranches including sub-branches (Z = 0andW = 0); - data using all 16 elements of the
codeauxblock.
- 32-bit values used with
- For the
indicesmode:- 32-bit values used with
byteStride: 2; - data relying on
lastwraparound; - data using both baseline index values;
- data containing varint-7 values above 4294967295.
- 32-bit values used with
My understanding is that this would be independent of this PR, and would belong in glTF-Sample-Assets?
That repository has a couple EXT_ assets and it should be easy to replicate these for KHR (see also https://github.com/KhronosGroup/glTF/pull/2517#issuecomment-3440081285) - that is not combinatoric of course.
In addition, I think perhaps just two assets should be enough to cover the JSON variations; a single indexed sphere mesh with normals and vertex colors can be used to test all 4 encoding modes (attributes could be encoded using v0 & v1 for different spheres; indices can be encoded using triangles or indices for different spheres), and 3 of 4 filters (e.g. exponential for float3 positions, octahedral for quantized normals, color for quantized colors), with varying byte strides. To test quaternion filter you need an animation or EXT_mesh_gpu_instancing; BrainStem asset has that but a basic animation could also be added to the same test asset, e.g. a small grid of spheres plus a rotating cube. I guess we could use instancing instead and rely on vertex color pattern rotating visually, as the sphere geometry is symmetric, or use a more complex base mesh instead of a sphere. Then the entire asset could be duplicated with a version with a fallback buffer, in which the extension would be optional, and the expectation would be that rendering matches between:
- asset with fallback, for renderers that don't support the extension;
- asset with fallback, for renderers that support the extension;
- asset without fallback, for renderers that support the extension.
The bitstream test assets are more difficult to produce just because of the sheer number, and the requirement to have custom code that produces them as these include suboptimal decisions that my current encoder never performs. It's definitely possible to make in the future; I'm also not sure where this asset would belong, as it would need to be part of some programmatic framework instead of a renderer ideally (e.g. the quaternion filter testing would need to produce a mostly incoherent quaternion stream, attempts to use that for animations would be difficult to validate visually).
My understanding is that this would be independent of this PR, and would belong in glTF-Sample-Assets?
Right but that's one of the conditions for marking an extension a Release Candidate.
we could use instancing instead
That would create a dependency on the instancing extension, which would be suboptimal for a meshopt-focused test asset.
There are some early ideas on creating dedicated collections of low-level test assets, bitstream tests would eventually go there.
Right but that's one of the conditions for marking an extension a Release Candidate.
The sequencing here isn't super clear to me; what status is this extension in right now?
A simpler pair of test assets for BrainStem & flower vase (linked in the comment above) test all 4 filters (although not all of them are with a full set of bytestrides) and two modes out of 3 (no INDICES coverage), using v1. The flower asset is a point cloud though so including that into glTF-Sample-Assets is probably complicated, this is from the issue I've filed a few years ago https://github.com/KhronosGroup/glTF-Sample-Assets/issues/31. Neither uses the fallback though; these are good to do basic testing of the existing implementations, but not as comprehensive long term. So probably the most direct route is a composite asset along the lines I've sketched above... maybe it's easier to make an asset with several cubes here, as that can then include animations for rotations which would be easy to visually distinguish.
For the existing EXT_meshopt assets in glTF-Sample-Assets repository, we could either "upgrade" them, or come up with a folder naming scheme that incorporates both variants as they are currently placed in glTF-Meshopt folders.
The sequencing here isn't super clear to me; what status is this extension in right now?
Admittedly the formal extension development lifecycle is not fully defined yet. This extension should be a Review Draft as of now.
To move it forward (to RC, which implies merging the PR), we'd need sample assets and at least one glTF viewer/loader implementation. It seems that the latter is trivial and almost complete (although not merged).
I "wrote" a script that generates a synthetic test asset that might be helpful here ("wrote" is quoted because the code is generated by GPT Codex, I've lightly reviewed the code and the generated asset, and I also confirmed that "breaking" individual parts of the decoder manually breaks the asset in "correct" ways).
The asset is a 5x5 cube grid, where columns of the grid vary the geometry encoding from the point of view of glTF spec (different bit counts for normals/colors/indices, interleaved vs not, and last column is an animated uncolored cube):
https://gist.github.com/zeux/d26340c53dd70d19ae18045e79d065df#file-layout-md
The generated file normally renders like this - subject to lighting conditions etc., the file itself doesn't carry a light setup:
The rows of the grid use different ways to encode the data. The top row is uncompressed, and then we have v0 compression with INDICES & TRIANGLES compression for index buffer without filters, then v0 with filters & triangles, and last row is v1 with filters & triangles. Note that interleaved data (first column) always has no filters applied, and as such v1 with no filters is implicitly tested here too.
For example, if I break octahedral filter decoding by returning 0 instead of decoding the data, I get this (affects last two rows with filters; doesn't affect animated or interleaved cubes because these are not using filters):
If I break INDICES decoding, I get this (doesn't affect animated cube because it's not compressed wrt geometry, doesn't affect other rows because first row is not compressed and subsequent rows are using TRIANGLES):
The script can also be used to generate data with a fallback buffer, which can then be loaded in a viewer that doesn't support this extension to begin with.
The linked Gist also has a small three.js viewer that uses three.js GLTF loader extensibility to substitute an implementation for KHR extension, but this should not be necessary for three.js once the "real" PR gets merged. It was waiting for the extension to get to review draft, so it sounds like that could be merged now - I'll ping folks in that PR separately.
If this implementation style is acceptable then I can generate the two variants (with/without fallback buffer) and that could be a new MeshoptCubeGrid test asset or thereabouts. The "normal" test assets are still valuable I think but they are much easier to generate as it's a matter of re-running gltfpack on the source assets, pending the note about which folder to place those in.
The asset looks fine, one request for adding it to the Sample-Assets repo: consider adding in-scene labels using textured planes to the rows and columns (see, for example, DispersionTest).