Gaussian Splatting in glTF
Co-author: @keyboardspecialist
Gaussian Splatting has been a hot topic within the 3D graphics industry for some time now, and many organizations have been exploring it. Here at Cesium, we've been playing with adding support for Gaussian Splatting to glTF and we wanted to share this with the community. We know others in the industry are also exploring this and our goal here is one of collaboration, not competition. As mentioned in the disclaimer below at the start of the summary of our apporach, we are not beholden to our current approach and are looking to work with the industry to find the right solution for the ecosystem, whatever that may be.
Gaussian Splatting in the Market
Gaussian Splatting is a natural fit for photogrammetry. It provides a relatively fast workflow from raw images to 3D representation which is beneficial for everyone from a hobbyist with a drone or a phone to enterprise organizations. Services like Niantic’s Scaniverse and Polycam allow any user with a phone to scan objects anywhere in the world, geolocate them and have a digital twin stored on a server that can be shared with other users. Where meshes were once generated, splatting can be used instead. Splatting has gained strong interest from the AEC sector for its higher fidelity and ability to maintain finer details on structures such as radio towers or power substations. Gaussian Splatting has been embraced by organizations for digital heritage preservation. This also has educational implications with bringing real world locations to life in new ways. In addition, splatting is continuing to grow in the AR/VR space.
Despite their integration with enterprise level software, organizations using Gaussian Splatting are largely still dealing with PLY files. PlayCanvas is one of the exceptions with their gsplat format and compressed PLY formats in their SuperSplat editor.
Reference Links
https://digitalheritagelab.com/index.php/2024/03/10/gaussian-splatting/ https://scaniverse.com/ https://aecmag.com/visualisation/v-ray-7-to-get-support-for-gaussian-splats/ https://radiancefields.com/gaussian-splatting-brings-art-exhibitions-online-with-yulei https://radiancefields.com/storysplat-bringing-3dgs-into-educational-experiences https://playcanvas.com/supersplat/editor
Why use glTF?
glTF is an open standard with wide interoperability. It’s an efficient format and provides a base structure from which we can build a Gaussian Splatting standard for today and in the future. PLY is the current ubiquitous format for splatting which is over 30 years old and was designed as a simple and general format. This is great for ease of implementation, such as in a research paper, but for actual market usage and interoperability it falls short. It’s a very loose and unstandardized format that does not provide a solid path forward for how splats should be represented and transmitted. Utilizing glTF also means that we can very easily begin integrating splats with other assets seamlessly for transmission and rendering. Its openness as a standard allows us to engage with the community to garner feedback on which direction we should take.
A problem with splats currently is how to render them efficiently whether it’s lack of LoD or the amount of data on screen to be rendered. 3D Tiles being built on top of glTF is the perfect opportunity to solve part of this problem. Full LoD remains an issue to solve, but tiling the data allows for efficient transmission and rendering.
Cesium's current approach
Disclaimer
We are not beholden to our current approach, but are including it here in part to help facilitate discussion. Ultimately, we want to do the right thing for the community, and are open to any and all approaches at this time. There may be other approaches, such as mirroring the .SPZ or .gsplat formats within glTF, that may be better. We’d like to invite authors of other approaches to also include their approaches in this issue.
Summary
We currently have a draft glTF extension to support Gaussian Splatting. Our approach is straightforward and emphasizes using the facilities provided by glTF as much as possible. We extend a mesh point primitive with new attributes with the requisite data to render splats. Gaussian Splats are defined by position, rotation, scale, opacity and spherical harmonics. Position directly maps to the glTF POSITION attribute. We map the zeroth-order spherical harmonic for diffuse color and opacity to COLOR_0. Two new additional attributes are added: _ROTATION and _SCALE for their respective splat attributes. Our approach is mostly a restructuring of source PLY data into glTF which gives flexibility to how it’s processed and rendered at runtime.
This is not a required extension which means we can seamlessly fall back to point cloud rendering if no support is found. Another benefit is that we support meshopt compression out of the box like any other mesh in glTF. Our current limitation is that we only support the zeroth-order harmonic for diffuse color. The higher order specular harmonics are size prohibitive and we have questions on what the best long-term strategy would be for storing and processing them.
https://github.com/CesiumGS/glTF/tree/gaussian-splatting-ext/extensions/2.0/Khronos/KHR_gaussian_splatting
Basic Example
The extension is added directly to the primitive itself. If it is quantized, and depending how, you may define a quantizedPositionScale value.
{
"accessors": [
{
"type": "VEC3",
"componentType": 5126
},
{
"type": "VEC4",
"componentType": 5121,
"normalized": true
},
{
"type": "VEC4",
"componentType": 5126
},
{
"type": "VEC3",
"componentType": 5126
}
],
"meshes": [
{
"primitives": [
{
"mode": 0,
"attributes": {
"POSITION": 0,
"COLOR_0": 1,
"_ROTATION": 2,
"_SCALE": 3
},
"extensions": {
"KHR_gaussian_splatting": {
"quantizedPositionScale": 1.0
}
}
}
]
}
]
}
We are working on hosting a public example using CesiumJS Sandcastle that we plan to share later this week.
Implementation
Loading
Loading glTF files containing Gaussian Splats is the same as any other. Support for the KHR_gaussian_splatting extension may need to be added first. Given that it is point primitive, if no support is found it will fall back to just rendering as any other point cloud. As part of this, meshopt decoding should happen automatically. If the data is quantized, it will need to be further processed at runtime.
Sorting
Because splats are not uniform and complex patterns and shapes are built through the layering of many Gaussians, they must be sorted by distance from the current camera position. In the suboptimal case, this means resubmitting vertex data to the gpu every sort. In the worst case, this would be every frame. However, if the scene does not change, no sorting needs to occur.
Further optimization can be had by generating textures from the splat data and submitting those to the GPU once. Sorting then just becomes an update to the indexes into those textures. Part of this optimization can be precomputing the 3D covariance from the scale and rotation.
Radix sorts generally provide good performance whether GPU accelerated or not.
Rasterization
Our current implementation is in CesiumJS with WebGL, so the process below reflects that. We don’t currently have access to WebGPU compute shaders which would offer more opportunities to optimize parts of the rendering process. Part of these limitations is we have to render splats as quads. Finding an approach that does not need any extra vertices generated could be beneficial to performance. Native implementations generally use CUDA to accelerate both sorting and rasterizing through GPU Radix sorts, tiling for higher parallelism, etc.
Vertex Shader
Most of the work is done here:
- If it hasn’t been precomputed you must compute the 3D covariance.
- 3D covariance is then projected into 2D space for the 2D covariance.
- 2D covariance is decomposed into eigenvectors
- Using those we calculate final vertex and and clip space position
Fragment Shader
The fragment shader is very straightforward. We use the squared distance from the center of the splat to calculate an exponential decay combined with the opacity to generate the final splat raster. The final output is the diffuse RGB color premultiplied with the calculated decay alpha value.
Spherical Harmonics and Specular
We consciously decided to not support higher order spherical harmonics dealing with specular highlights for a few reasons:
- Size - They constitute the majority of the splat size. Minus zeroth-order diffuse, 45 32-bit floats - 180 bytes per splat.
- We don’t want to impose lossy compression on our users by default
- Question of benefit versus complexity.
- Specular does add dynamism to the scene but is the cost of storage and computation worth it in all cases?
- Leave room for a future method of storing or calculating them
- Computation complexity increases quadratically with each degree. Calculating this per splat per frame is prohibitively expensive on platforms without highly parallelizable compute. Is first degree specular enough for most cases?
- Degree 0 - 3 muls
- Degree 1 - 18 ops: 9 muls and 9 adds
- Degree 2 - 40 ops: 25 muls and 15 adds
- Degree 3 - 80 ops: 50 muls and 30 adds
We believe it makes more sense to propose this as one or more separate extensions which can deal with this complexity on their own. These extensions should answer questions such as:
- What does storage in glTF look like?
- How is it represented on the GPU?
- Are there other novel ways to represent specular without using spherical harmonics?
- Can we leverage other parts of glTF here?
- How do we compress the data for both lossy and non-lossy use cases?
https://github.com/adobe/USD-Fileformat-plugins fyi
One concern raised in the MrNerf Discord server, is that standardizing on a format this early in the game might either stifle innovation, or result in an explosion of competing standards, especially in a field as actively researched as this.
New methods that may need to store additional information would be forced to forego early standardizations like this.
On top of that, most whitepaper implementations likely simply won't care to support anything beyond what they need.
While the creation of standards is inevitable, whether they will be able to be successful at this point remains to be seen.
After some further discussion, we instead worked out a super early(!) draft for a high-level container format.
To be clear: This would have no direct implications on your work as it operates at a different level entirely, but it felt worth mentioning it here since it's closely related and adds a different perspective, and now is technically ongoing work in parallel.
An initial early version of this draft is posted here, for reference: https://gist.github.com/SharkWipf/a02a2616424d0a2ab69af2d3ad8c1829
Concerns on standardizing too early are certainly valid. I want to emphasize that our goal is to start discussions and not to just push for ratification of some standard as quickly as possible. We don't expect glTF to be the only format within the splatting ecosystem. Rather this is an attempt to get them into glTF for the "last mile" where glTF thrives. While we’d love to see white papers use glTF for splats, it isn't really the goal here. Bleeding edge research is the perfect place for highly flexible and bespoke formats. That said, I don't think it's reasonable to forgo any discussions in standardization over what might or might not appear in the future. When we look at the market today it is largely (all?) using the original reference implementation. So while flexibility is important and something we hope to incorporate here, we also want to be pragmatic as to what is essential to support. A solution to reduce fragmentation for the early “last mile” production use cases of Gaussian Splats is our ultimate goal.
There is ongoing discussion in the Metaverse Standards Forum, and we’d like to extend the invitation to you and others to contribute. Niantic will be presenting their work on the SPZ format in the near future.
A high level container format is interesting. After reading the spec overview, I see both of these coexisting rather than competing. Seems like it would be perfectly reasonable that it could contain a glb.
Finally, we're interested in joining the discussion on the MrNerf discord server. How can we join the server?
Pretty much agreed on all points. To name one specific example though, of something that I believe isn't currently accounted for in any of the proposed formats, ScaffoldGS was, last I checked, SOTA or close to SOTA, in quality. But in order to accurately represent them in viewers, I believe (I may be wrong here though) you need to store both the MLPs and the anchors. I don't think any of the proposed formats account for this kind of scenario.
A high level container format is interesting. After reading the spec overview, I see both of these coexisting rather than competing. Seems like it would be perfectly reasonable that it could contain a glb.
That was the intention, yeah. To be able to handle the different formats that are inevitably going to be popping up in a somewhat organized manner.
Finally, we're interested in joining the discussion on the MrNerf discord server. How can we join the server?
The invite link: https://discord.com/invite/NqwTqVYVmj As taken from MrNerf's twitter profile.
I really like this topic and I will be very happy to try my best submitting PRs for loading an early version of GS in glTF to SuperSplat were I am a little active. Regarding the standardization I still think that Gaussian Splats are not yet the final ideal way of shaping the data. I cannot stop emphasizing, that I have the impression that Spherical Harmonics do a bad job while eating up all the memory :)
If I were able to either having the time to learn that or having the knowledge to just do it, I would try to experiment with splats that have a viewing direction and disappear off-axis.
In January 2025, during the Metaverse Standards Forum Town Hall, the community expressed positive interest in standardizing, compressing, and optimizing Gaussian Splats, with direct interest from Niantic to integrate SPZ with glTF. Based on this feedback, we are exploring the use of SPZ in glTF and 3D Tiles as a better container for Gaussian Splats.
We’ll share our early results and findings with the community to help build consensus around the best path forward for the ecosystem. If you have used SPZ or have interest in collaborating, please don’t hesitate to reach out.
we are exploring the use of SPZ in glTF and 3D Tiles as a better container for Gaussian Splats
@weegeekps @keyboardspecialist this is really fantastic! A few thoughts:
- Could you please share links to any work-in-progress branches; notes on potential schema; etc. here?
- @nbutko is there anyone(s) that @weegeekps @keyboardspecialist should collaborate closely with?
This is so exciting and my sense it will gain a lot of traction.
Hi, @pjcozzi. We are currently working out the details of the schema. We are using the KHR_draco_mesh_compression extension as inspiration for the pattern we will follow. We are planning to have a more thorough write-up sometime in the middle of next week. As we develop and implement this, we will be actively sharing any work in progress code.
We're looking forward to collaborating with others on this.
@keyboardspecialist cool, for loading SPZ in JavaScript, perhaps this webasm project would be of interest: spz-loader
@keyboardspecialist, @lilleyse, and I have been brainstorming what bringing SPZ into glTF may look like. Using the Draco extension for inspiration, we think that it should be straightforward to make a similar extension for SPZ. If you're unfamiliar with the KHR_draco_mesh_compression extension, it stores data in a compressed buffer which at runtime is decompressed and mapped to placeholder attributes for the primitive.
The mapping then for SPZ would be very similar to what is being done with our first pass at this extension, and we could continue to use familiar semantics. Many of these are fairly obvious: POSITION, _ROTATION, and _SCALE can be used to refer to the position, rotation, and scale for a Gaussian.
Some semantics are less obvious, such as how to handle spherical harmonics and opacity. Our first pass attempt at Cesium has been to use COLOR_0 for the opacity and lowest-order spherical harmonic data, and we could probably continue that pattern. COLOR_n (n > 0) could be used to refer to individual channels for the spherical harmonics. This is similar to what @ebeaufay is is doing in ULTRA_splats as mentioned in a discussion for the .splat universal format. A value defining what level of spherical harmonics are available is required, but it's overall a slick approach.
Do others feel that this is the right way to handle spherical harmonics? One additional thought we've had is that perhaps an additional extension is needed to define semantics for spherical harmonics?
I'd strongly suggest resolving #2111 before standardizing any extension that touches glTF vertex attributes. /cc @javagl
Just adding @RenaudKeriven to this thread as he would like to stay up-to-date. @RenaudKeriven has presented at the last two Metaverse Standards Forum gaussian splat town halls with @keyboardspecialist. He is working on gaussian splat generation at Bentley.
A little earlier today I opened https://github.com/KhronosGroup/glTF/pull/2490 containing draft proposal for an extension we are calling KHR_spz_gaussian_splats_compression. This is the extension my colleagues @keyboardspecialist and @RenaudKeriven have talked about in the MSF town halls on 3D Gaussian splatting over the past few months. It uses SPZ for efficient compression and storage of 3DGS data within a glTF buffer.
We're excited to hear the community's feedback about our proposal and look forward to discussing further.
Really positive to see such development towards standardizing a glTF KHR extension KHR_gaussian_splatting then KHR_spz_gaussian_splats_compression and being public about it, reusing spz compression benefits and forwarding implementation within cesiumjs renderer - potentially using OGC 3D Tiles where each glTF could encapsulate a gsplat tile.
Probably relevant: https://trianglesplatting.github.io/
^ Not having to implement a special renderer could be a huge plus.
And I like how the first sentence on the GitHub abstract sounds like it's taken from the 1970's:
Our work represents a significant advancement in [...] rendering by introducing 3D triangles as rendering primitive.
The code should be available soon. If this works without any client-side, per-frame sorting (which, I think, it claims to), then this could really be a huge simplification.
Love the efforts by @weegeekps here and with Cesium for Gaussian Splatting (hat tip to @keyboardspecialist).
Having seen hundreds of these kinds of neural 3D reconstruction projects, I’ll just say: — They frequently come & go — Industry united around Gaussian Splatting today — Stay hungry, stay foolish
then this could really be a huge simplification.
Might still leverage meshlets or something. We'll see.
It could also be used to feed all kind of polygon soup based workloads.
It could probably leverage DRACO too.
introducing 3D triangles as rendering primitive.
We've come full circle 😅
But it might be seen as of topic for that very reason. I just felt like it was worth mentioning.
Thank you for sharing that paper @JMLX42. Definitely worth mentioning.
I took a brief look at the prepub, and it seems to have a similar issue to some other methods recently published. An issue that we've seen with many of these methods is that if you have an existing field of splats, you can't go from that to the special data structure, in this case triangles, without the source images. That seems like a major limit to flexibility for a standardized splatting format. Ideally, the best method would allow you to either process straight from source photos or convert from existing 3DGS files, such as PLY.
I've just been told about this extension, and I would like to give some criticism which I hope it's taken constructively.
To my understanding, Gaussian Splatting, and by extension all stuff related to photogrametry is essentially a single, static big blob of data which from the point of view of rendering, can be considered a single "mesh".
It does not have animation, or any kind of scene graph, or transforms. Just a mesh... so from my point of view Gaussian Splatting is closer to formats like STL or even wavefront OBJ than to glTF, and I think it would make more sense to define a standarised format specifically designed for Gaussian Splatting "meshes".
Now, if glTF infrastructure is required for scene graphs with multiple meshes, data annotation, etc, what I would do is an extension that allows a Node to reference an "external mesh" file, which can be a Gaussian Splatting, an STL, or whatever format that represents a single mesh.
My personal opinion is that Integrating Gaussian Splatting within the architecture of glTF feels to me another case of overengineering.
I also have some doubts and concerns about Gaussian Splats in glTF. But these are on a low level, and very specific, and will hopefully discussed an iterated on elsewhere.
Regarding the concern of "Does this belong into glTF?", some higher-level points:
Gaussian splats are receiving a tremendous amount of attention right now. New methods for creating them, optimizing them, and storing them are developed left and right. There's a whole survey comparing the compression methods, for example. So when you say...
I think it would make more sense to define a standarised format specifically designed for Gaussian Splatting "meshes".
... then the response is: This is happening. One could even be tempted to say: Too much of this is happening - at least from the perspective of someone who would like to ~"just render Gaussian Splats", and doesn't know which of the dozens of formats that renderer should support. And nobody knows which format will "win", eventually.
Now, if glTF infrastructure is required for scene graphs with multiple meshes,
Not really. In fact, quite the contrary. There are some open questions about how to cleanly integrate all this into glTF: Will node transforms affect the splats? How do they align with the PBR material model? What happens when all this is animated? Can gaussians be stored in sparse accessors? Can morphing be applied to them? (And many more...)
what I would do is an extension that allows a Node to reference an "external mesh" file, ...
This already has caveats. People would like to have their single, self-contained .glb file. External references can break and can be clumsy to handle. But even if this was solved cleanly...:
... which can be a Gaussian Splatting, an STL, or whatever format that represents a single mesh.
This "whatever format" is the crux here. Which one should it be? STL, PLY, GSPLAT, SPZ, or maybe something that is texture-based...?
Maybe one way of summarizing the intention here could be:
Gaussian Splats are considered to be integrated into glTF in a way that follows the spirit of glTF. Namely, to have them in a form that is suitable for "last mile delivery": Reasonably compact, and easy to render (i.e. easy to upload to the GPU and to use in a shader). This should be possible without having to manually parse some obscure, 1990's file format that just happened to be used for splats for lack of better options until recently. And regardless of which of the options that are currently being developed "wins": It should always be possible to bring them into a glTF in a form that is agreed upon and easy to render, so that they can be rendered by every client that decided to support the glTF Splat representation.
That said: I share some of your concerns. I wouldn't even call it "overengineering", but rather the risk of picking a representation or an approach that turns out to be "wrong" in one way or another. But I think that a common, agreed-upon, renderable representation of splats could allow much broader adaption for practical applications.
... which can be a Gaussian Splatting, an STL, or whatever format that represents a single mesh.
Not the 3D file format nightmare again. That's the whole purpose of glTF: define the meshes with respect to how 3D graphics API do it.
There are some open questions about how to cleanly integrate all this into glTF: Will node transforms affect the splats?
IMHO pretty easy to support either way.
How do they align with the PBR material model?
Could be not at all. Could be something like this: https://nju-3dv.github.io/projects/Relightable3DGaussian/
What happens when all this is animated?
Can morphing be applied to them? (And many more...)
I think that no support for morphing/animation is pretty OK considering the content of GS.
I'd still like to better understand the connection between the discussion that is happening here, the pull request at https://github.com/KhronosGroup/glTF/pull/2490 , and the extension (draft/proposal?) at https://github.com/CesiumGS/glTF/tree/gaussian-splatting-ext/extensions/2.0/Khronos/KHR_gaussian_splatting (for which apparently not PR exists yet).
This is a bit of brainstorming now, and I'm throwing it out so that people can tell me where and why it doesn't make sense:
The concept of Gaussian Splats inside glTF could be relatively simple: The actual data that the renderer needs seems to be something like this:
POSITION(VEC3,float)COLOR_0(VEC4,unsigned byteorfloat)_ROTATION(VEC4,float)_SCALE(VEC3,float)_SPHERICAL_HARMONICS_<i>_<j>(VEC3,float, optional, up to 15 of these...)
As it is written here, this is plain and uncompressed, and could be sent to the GPU exactly like that.
Now, the PR at https://github.com/KhronosGroup/glTF/pull/2490 just describes one (special) source for (exactly) this data, namely SPZ.
There could be other extensions that define other sources for the splat data. Which accessors/attribute names would they define? Yeah, maybe something like
POSITION(VEC3,float)_SPLAT_COLOR(VEC3,unsigned byteorfloat)_SPLAT_OPACITY(SCALAR,float)_ROTATION(VEC4,unsigned short normalized)_SCALE(VEC3,unsigned byte normalized)_SH_<i>(VEC3,unsigned byte normalized, optional, up to 15 of these...)
How would renderers support that? Well ... write a decoder and implement it, with all the subtle differences to the other representation.
The brainstorming
The issue at https://github.com/KhronosGroup/glTF/issues/2111 is not addressed yet, but it should be resolved before any form of Gaussian Splats makes its way into glTF. And I think that one way to handle this could be the following:
1. There is an extension KHR_gaussian_splatting that defines the attribute names and types.
This extension defines the attributes in their plain, unpacked, agreed-upon form, as a bunch of (mostly) VECn/float accessors. The value and puprose of this extension would be to define the attribute names. And this would include the disambiguation. This means that this extension would establish the contract for renderers: When you find this extension in a mesh primitive, then it is guaranteed to contain the atributes
KHR_gaussian_splatting_POSITION(VEC3,float)KHR_gaussian_splatting_COLOR(VEC4,unsigned byteorfloat)KHR_gaussian_splatting_ROTATION(VEC4,float)KHR_gaussian_splatting_SCALE(VEC3,float)KHR_gaussian_splatting_SPHERICAL_HARMONICS_<i>_<j>(VEC3,float, optional, up to 15 of these...)
(Yes, with these obscure prefixes. That's the whole point!)
This extension does not say anything about the representation (beyond the data types). The extension would basically allow them to be stored, as-they-are, uncompressed, in buffer views.
2. There may be dozens of different extensions that are built upon that one.
One of them would be KHR_spz_gaussian_splats_compression. This one defines that these accessors are not "filled" from standard buffer views, but from SPZ data. Another extension could be KHR_gaussian_splats_meshopt that defines these attributes to be filled from meshopt data, but also requires them to have these names
The whole point of this would be to decouple the concept of "Gaussian Splats in glTF" from "The storage of Gaussian splats". Renderers could support Gaussian splats generically, and rely on the fact that the loader provides them all the KHR_gaussian_splatting_* attributes. The loader is then responsible for ~"unpacking input data from different extensions", if the data is not stored directly in its plain, uncompressed form.
Does that make sense?
Thanks, @javagl. Our currently opened PR https://github.com/KhronosGroup/glTF/pull/2490 is the most recent proposed extension for handling compressed 3D Gaussian splats in within glTF. The KHR_gaussian_splatting extension is our original take at this for uncompressed splats. In essence, it's been abandoned in favor of our latest, but I'm thinking it's time to revive it.
Earlier on, we had this very idea of splitting out a base extension and then a series of compression extensions. We shied away from this some to avoid overcomplicating things early on for implementors and the community. After a weekend of some retrospection, I now feel that was a mistake. @lilleyse, @keyboardspecialist, and I have repeatedly grappled with the idea of having a base extension and then building additional compression extensions on top of that. I am now beginning to see the ideal world as something like this:
This gives us flexibility that we currently do not. In the event that a use case is unable to be properly handled by a particular compression format, this would allow for that avenue to be open. It also allows for an easier path forward when newer and better methods of compressing or storing splats become available. The risk is fragmentation: if we as a community don't agree to a few focused forms of compression, then this could easily become unwieldy.
SPZ seems like a no-brainer to me for an early adopter format. It is easy to implement everywhere thanks to its ecosystem, and I suspect that for a majority of applications it will be the ideal choice. It offers a great balance between ease of use, precision and file size and will only get better with time. Meshopt then seems like a solid option as the other early alternative, for applications where implementors are more concerned with precision over file size. Given meshopt's general ubiquity within the glTF ecosystem, it should be equally easy to implement (in some ways easier) than SPZ.
The overall downside to this is that to properly support splats in a generalized renderer or engine, now implementors need to implement 3 potential pathways; uncompressed, SPZ compressed, or meshopt compressed. This additional complexity is why we shied away at first, but hopefully this isn't too problematic for implementors. Of course, developers of applications with custom renderers could implement only the pathways they need.
Thoughts?
The risk is fragmentation: if we as a community don't agree to a few focused forms of compression, then this could easily become unwieldy.
I see the point of possible fragmentation. But the same could be said if we didn't introduce such a "base extension". If 10 new compression methods are proposed in the next year, then these could be 10 different extension with hardly anything in common. If they are all defined on top of the base extension, then could at least offer the same "interface" for renderers, in terms of the agreed-upon attribute structure and names. (So that could alleviate some of the problems that come with fragmentation).
And as I said: Think that it could be a reasonable approach of addressing the issue of attribute name diambiguation in the broader context of "Spats in glTF". Of course, this could also be addressed in each splat extension individually, with names like KHR_spz_gaussian_splats_compression_spherical_harmonics_2_1 or so. But from my understanding, the splats themself will always have the same structure, and the differences are only in how they are compressed. As such, this "base extension" would be a new 'kind' of extension (compared to the existing ones), in that it only defines a set of mesh primitive attributes and their names.
EDIT
SPZ seems like a no-brainer to me for an early adopter format. It is easy to implement everywhere thanks to its ecosystem, and I suspect that for a majority of applications it will be the ideal choice.
From what I've seen, this is very likely the case. It seems to offer a good trade-off between all the factors that come into play, and is really simple. (One could argue that it is not very flexible, but that is the reason for its simplicity, and if that simple solution is satisfactory for 95% of all cases, that sounds reasonable)
Thanks, @javagl. Our currently opened PR #2490 is the most recent proposed extension for handling compressed 3D Gaussian splats in within glTF. The
KHR_gaussian_splattingextension is our original take at this for uncompressed splats. In essence, it's been abandoned in favor of our latest, but I'm thinking it's time to revive it.Earlier on, we had this very idea of splitting out a base extension and then a series of compression extensions. We shied away from this some to avoid overcomplicating things early on for implementors and the community. After a weekend of some retrospection, I now feel that was a mistake. @lilleyse, @keyboardspecialist, and I have repeatedly grappled with the idea of having a base extension and then building additional compression extensions on top of that. I am now beginning to see the ideal world as something like this:
This gives us flexibility that we currently do not. In the event that a use case is unable to be properly handled by a particular compression format, this would allow for that avenue to be open. It also allows for an easier path forward when newer and better methods of compressing or storing splats become available. The risk is fragmentation: if we as a community don't agree to a few focused forms of compression, then this could easily become unwieldy.
SPZ seems like a no-brainer to me for an early adopter format. It is easy to implement everywhere thanks to its ecosystem, and I suspect that for a majority of applications it will be the ideal choice. It offers a great balance between ease of use, precision and file size and will only get better with time. Meshopt then seems like a solid option as the other early alternative, for applications where implementors are more concerned with precision over file size. Given meshopt's general ubiquity within the glTF ecosystem, it should be equally easy to implement (in some ways easier) than SPZ.
The overall downside to this is that to properly support splats in a generalized renderer or engine, now implementors need to implement 3 potential pathways; uncompressed, SPZ compressed, or meshopt compressed. This additional complexity is why we shied away at first, but hopefully this isn't too problematic for implementors. Of course, developers of applications with custom renderers could implement only the pathways they need.
Thoughts?
I agree that having a base extension and keeping compression separate is likely the right long-term approach. One benefit to our first extension was that we didn't have to do anything special to handle meshopt. It just worked. So we really wouldn't need any new extension specifying meshopt splats. It highlighted why it was a good approach, and I talked about it a bit in the first MSF Town Hall presentation.
That said, SPZ fits right in with that as well. You'd need a new extension in this case, but it would interact with the base extension in the same way. We can have SPZ encoded buffers that decode into our KHR_gaussian_splatting defined attributes, and it should just work. It's an easy win supporting a nice format.
There's also the angle of in-memory compression from cpu to gpu. We want the flexibility to allow for buffers that can remain compact throughout both transmission and runtime rendering.