godot-proposals icon indicating copy to clipboard operation
godot-proposals copied to clipboard

Implement Mesh streaming

Open reduz opened this issue 2 years ago • 50 comments

Describe the project you are working on

Godot

Describe the problem or limitation you are having in your project

For large scenes, loading meshes (3D models) consume a lot of video memory and it takes a long time to do so.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Mesh streaming aims to solve this. A fixed amount of memory is reserved for streaming and then meshes are streamed in (higher detail) and out (lower detail) depending on proximity to the camera.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Overview

The first thing that needs to be understood is that an efficient, modern mesh streaming has many requirements that need to be met:

  • Ability to split the mesh content into pages that can be loaded in and out in a fixed amount of video memory (to avoid fragmentation and defragmentation).
  • Ability to be culled entirely in the GPU, as we are intending to draw massive amount of objects.
  • Ability to use existing Godot materials.

To comply with the first requirement, this generally means using triangle strips and a fixed format that covers the most common use case (vertex, normal, tangent, uv). As such, colors, uv2, bones, weights, indices, etc. will not be supported.

This means that, with common encoding, a vertex will take 28 bytes. A common chunk of mesh can be 64 vertices, so this means 1792 bytes. If we want to be able to "cull" those chunks (so an object that hits the camera only has those chunks drawn if passing frustum and occlusion culling), the triangle strip should be self contained, otherwise it can just continue until the end.

In practice, this means that this needs to be a separate type of mesh, likely StreamedMesh that is registered separately in Godot than standard meshes.

As these chunks would need to be streamed form low detail to high detail, this means that different LOD versions would need to be stored separately so they can be streamed in and out (no index based LOD).

Rendering

Remember that our main goal is to still be able to use Godot materials, otherwise the workflow would get too complicated. To achieve this, the base algorithm would be more or less like this:

  • Cull all the instances of StreamMesh using a compute shader, mark them visible or invisible (occlusion culling can be added later, like two pass occlusion culling).
  • Count how many instances are visible for each material and how many chunks for each mesh with a compute shader.
  • Copy the indices of the visible chunks for all objects to a very large array.
  • Store the first index for every material.
  • Call a drawing primitive (draw arrays indirect) for every material using the proper start vertex offset.
  • In the shader, fetch the right vertex based on the gl_VertexIndex.

Of course, there are more things that need to be taken care of:

  • Finding the right LOD level
  • Determining when LOD levels need to be streamed-in or out
  • Culling objects that intersect the frustum so only chunks that intersect are visible.
  • Culling objects against the depth buffer if using two pass occlusion culling.

As it should be obvious, this render pass is separate from the regular geometry render pass in Godot.

Q: Is this like Nanite? A: No, Nanite is more complex since it has a LOD hierarchy (which means a single large object can have multiple levels of detail). This just aims to be a good enough solution for most use cases that is not as complex to implement, but it could eventually be extended into something more akin to Nanite if there is more demand in the future.

If this enhancement will not be used often, can it be worked around with a few lines of script?

N/A

Is there a reason why this should be core and not an add-on in the asset library?

N/A

reduz avatar Jan 16 '23 21:01 reduz

I wonder what if we have a option to generate impostor as the most distant 3d static meshes, then as it approaches the camera start using LODS, wdyt?

ps: if this is not related, please remove comment!

nonunknown avatar Jan 16 '23 22:01 nonunknown

As such, colors, uv2, bones, weights, indices, etc. will not be supported.

Couldn't this just be a project setting?

robbertzzz avatar Jan 17 '23 00:01 robbertzzz

I have some meshes with one LOD with Godot generating lower lods. I have other meshes with artist created lods, and my own system to manually switch lods, and shadow impostors. Both are embedded in their own glb. If I'm going to use this system, I'd need to be able to plug into it for both generated and artist created lods.

If by "stored separately", you mean lods stored in separate files, that's not going to happen on my project. I've already manually imported a thousand assets three times due to devs breaking the arraymesh format. But that shouldn't be necessary. RAM is abundant and unimportant. Streaming and unloading for VRAM is the only real need IMO. You should be able to load all lods from the same physical file, but stream them to the video card on demand.

TokisanGames avatar Jan 17 '23 06:01 TokisanGames

I'm not a graphics programmer by any means but I just love reading through the internet. There might be something here.

1. Visibility Buffer by Wolfgang Engel https://diaryofagraphicsprogrammer.blogspot.com/2018/03/triangle-visibility-buffer.html http://filmicworlds.com/blog/visibility-buffer-rendering-with-material-graphs/

2. GPU-Driven Rendering Pipelines by Sebastian Aaltonen This includes Mesh Cluster Rendering and Occlusion Depth Generation.

3. Rendering The Hellscape Of Doom Eternal

4. A hierarchical Framework For Large 3D Mesh Streaming On Mobile Systems

5. Streaming Meshes - UNC Computer Science

Qubrick avatar Jan 17 '23 07:01 Qubrick

Perhaps worth checking out 3D Tiles? It's an open standard for streaming massive 3D content, including buildings, trees, point clouds, and vector data. I just saw that they are coming out with a new feature soon to use custom glTF 2.0 assets as tile content. Also coincidentally I saw @reduz was on a podcast with CEO of Cesium / creator of 3D Tiles. image Source: https://github.com/CesiumGS/3d-tiles#specification

Examples: https://sandcastle.cesium.com/?src=Clamp%20to%203D%20Tiles.html

madjin avatar Jan 17 '23 21:01 madjin

As such, colors, uv2, bones, weights, indices, etc. will not be supported

Doesn't that mean that lightmaps won't work with this system?

Also, not allowing vertex colors to work with this kind of denser mesh system sounds like a missed opportunity. Using vertex colors as mask values for AO or grunge and wear is a nice way to save texture memory. This part of a presentation on Nanite demonstrates what such a workflow would look like.

and-rad avatar Jan 17 '23 23:01 and-rad

Doesn't that mean that lightmaps won't work with this system?

There's generally an assumption that in games with large open worlds, you can't use lightmaps as they'd require very large files and would take a long time to bake. Instead, SDFGI is the preferred solution (or multiple baked VoxelGIs, as its file size doesn't depend on mesh complexity and is much faster to bake).

Calinou avatar Jan 17 '23 23:01 Calinou

As the person who requested custom uv1-8 I would like that to work, but I understand that this is a bit packing problem.

I also expect that people will attempt to use this system with something like bones or they will hack it in via vertex animations with hierarchy for the field of grass and trees usecase. The godot skeletal animation is a variation of the vertex animation compute shader approach, so I don't see much difference.

fire avatar Jan 18 '23 01:01 fire

I think you guys should definitely Checkout "Nvidia MicroMesh" it was unveiled by Nvidia during the launch of the RTX 40 series Graphics card in late 2022.

Benefits

  • it's FREE and open source
  • cross platform and cross vendor
  • has support for Displacement
  • supports opacity
  • better than nanite in UE5
  • supports hardware acceleration
  • built from the ground up for ray tracing
  • unmatched ray tracing performance
  • and more.

Learn more: https://developer.nvidia.com/rtx/ray-tracing/micro-mesh

Demo: This realtime tech Demo from Nvidia uses MicroMesh™

https://youtu.be/AsykNkUMoNU

ClinToch avatar Jan 19 '23 09:01 ClinToch

Really interesting, but isn't that just some super efficient tesselation and displacement (with fancy opacity rendering). At least, that that is what I understood form their article. While this is really cool tech, I don't think it is a replacement for mesh streaming or cascading LODs (e.g. nanite).

WrobotGames avatar Jan 19 '23 14:01 WrobotGames

I think you guys should definitely Checkout "Nvidia MicroMesh" it was unveiled by Nvidia during the launch of the RTX 40 series Graphics card in late 2022.

Benefits

  • it's FREE and open source
  • cross platform and cross vendor
  • has support for Displacement
  • supports opacity
  • better than nanite in UE5
  • supports hardware acceleration
  • built from the ground up for ray tracing
  • unmatched ray tracing performance
  • and more.

Learn more: https://developer.nvidia.com/rtx/ray-tracing/micro-mesh

Demo: This realtime tech Demo from Nvidia uses MicroMesh™

https://youtu.be/AsykNkUMoNU

it's FREE and open source, but I believe it will be under the NVIDIA RTX SDKs LICENSE like their other SDKs (ex.: https://github.com/NVIDIAGameWorks/Opacity-MicroMap-SDK/blob/main/LICENSE.txt)

Not sure that it's compatible with Godot's license.

silverkorn avatar Jan 19 '23 14:01 silverkorn

Not sure that it's compatible with Godot's license.

That license is indeed proprietary, and therefore not suitable for inclusion with Godot.

Calinou avatar Jan 19 '23 17:01 Calinou

There's generally an assumption that in games with large open worlds, you can't use lightmaps as they'd require very large files and would take a long time to bake. Instead, SDFGI is the preferred solution (or multiple baked VoxelGIs, as its file size doesn't depend on mesh complexity and is much faster to bake).

That's true, but it might be worth keeping in mind that "large scene" doesn't have to mean big open world, it can also mean confined spaces with lots of detail going on. If I was making a game that takes place on a space ship and the engine had proper mesh streaming support, I'd greeble the hell out of my scenes. Lightmapping might still be desired in a case like that.

Personally, I would love nothing more than to never unwrap a lightmap again in my life.

and-rad avatar Jan 19 '23 23:01 and-rad

Alternatively, a much simpler way to implement mesh streaming would be to extend visibility ranges to emit signals on visibility changes. This would allow more flexible streaming behavior from the GDScript side.

3.5 allows specifying a maximum distance in VisibilityNotifier, but this isn't implemented in 4.x yet.

Calinou avatar Jan 24 '23 10:01 Calinou

For example, leaf nodes in an HLOD tree could simply be placeholder lowpoly meshes that defer loading their full mesh data (attributes, LODs, shaders, etc) until the camera is close enough to trigger the visibility range.

That's actually what I used for one of my projects and it worked really well despite having mesh priority issues because I couldn't be bothered to implement more complex codes.

atirut-w avatar Feb 02 '23 14:02 atirut-w

Reading your proposal it sounds very similar to what you would beed for mesh rendering, but using different terms (e.g. "chunk" instead of "meshlet"). You also mention doing the culling in a compute shader instead of in a Task Shader.

Are you intentionally not using the correct terms in your proppsal or was this an oversight? Assuming this was done intentionally it I am guessing you want to emulate mesh shaders without actually using them (and thus requiring modern hardware), but I have to wonder if it is worth it. Any project large enough to really need these kinds of optimizations will probably already require recent-ish hardware, and at that point we might as well use a mesh pipeline.

When it comes to the actual implementation it should also be pointed out that it is usually highly preferable to do this over multiple passes. You start with the largest & closest objects with the least vertices and draw them first before you downsample the resulting depth so that you can compare it against bounding spheres of the meshlets in the next pass. This way you can cull an increasingly larger percentage of the meshlets with every pass before even entering the mesh stage.

Finally I would like to point out that at least one custom vector per vertex might be necessary. A skilled technical artist could really use that for animations. All the usecases that drove me to investigate mesh shading would require it. (Mostly vegetation stuff, and yes, I am aware that this will also make the culling less efficient)

Ansraer avatar Feb 03 '23 16:02 Ansraer

I was assuming reduz was redefining terms to give his own meaning to common words.

https://github.com/zeux/meshoptimizer#mesh-shading which is already in godot 4 has meshlet creation support. I am also investigating doing a coarse grid division using @lawnjelly 's code at https://github.com/v-sekai/godot-splerger

fire avatar Feb 03 '23 18:02 fire

Are you intentionally not using the correct terms in your proppsal or was this an oversight? Assuming this was done intentionally it I am guessing you want to emulate mesh shaders without actually using them (and thus requiring modern hardware), but I have to wonder if it is worth it. Any project large enough to really need these kinds of optimizations will probably already require recent-ish hardware, and at that point we might as well use a mesh pipeline.

Even if we do end up using mesh shaders, I think we'll need an emulation path as a fallback regardless. Today's AAA games can still run on Pascal/RDNA1 or even Maxwell/Polaris GPUs after all. Support for mesh shaders on integrated graphics may also be less than stellar.

Calinou avatar Feb 03 '23 18:02 Calinou

Could it be possible that next, if it similar to nanite, could it use a programmable rasterizer for things like foliage and deformation. https://docs.unrealengine.com/5.1/en-US/nanite-virtualized-geometry-in-unreal-engine/#supportedfeaturesofnanit

edit i saw some good article that covers some nanite stuff, hope it helps https://www.reddit.com/r/hardware/comments/gkcd9b/pixels_triangles_whats_the_difference_how_i_think/

Also about the lod thing, i honestly think that if it doesn't make it harder to run on less powerful gpu, hlod could be added at first like nanite, as apparently thanks to chatgpt research it gives 3x perfomance compared to traditional lod's

Screenshot_2023-02-19-17-19-20-17_40deb401b9ffe8e1df2f1cc5ba480b12

Screenshot_2023-02-19-17-20-02-00_40deb401b9ffe8e1df2f1cc5ba480b12 Screenshot_2023-02-19-17-20-05-79_40deb401b9ffe8e1df2f1cc5ba480b12 Screenshot_2023-02-19-17-20-09-62_40deb401b9ffe8e1df2f1cc5ba480b12

Saul2022 avatar Feb 12 '23 17:02 Saul2022

Also about the lod thing, i honestly think that if it doesn't make it harder to run on less powerful gpu, hlod could be added at first like nanite, as apparently thanks to chatgpt research it gives 3x perfomance compared to traditional lod's

@Saul2022 Godot already supports HLODs in the form of visibility ranges, so we should be good to go there.

I know i meant just having them automatically generated with mesh streaming instead of traditional lod.

Even if we do end up using mesh shaders, I think we'll need an emulation path as a fallback regardless. Today's AAA games can still run on Pascal/RDNA1 or even Maxwell/Polaris GPUs after all. Support for mesh shaders on integrated graphics may also be less than stellar.

The general recommendation is to store geometry data as meshlets, and convert back to individual triangles (standard vertex+index buffers) on hardware that doesn't support mesh shaders.

Then gtx 1080 is rdna2, i saw some videos with nanite running the valley of the ancient demo on that gpu and seemed to work.

I would strongly suggest splitting this into two separate proposals, one for streaming Resources, and the other for meshlets+mesh shaders, rather than combining them into a single system that streams at the meshlet level. Focusing on them separately allows them both to be made much more broadly applicable and useful for many more Godot users, while still being able to cover Nanite-like tasks of streaming and rendering highpoly meshes.

While i agree i think that for sake of easy to use there should be an option for streaming to add what is stated in the proposal while having the meshlet-mesh shader implementation independent.

Saul2022 avatar Feb 21 '23 09:02 Saul2022

Then gtx 1080 is rdna2

No, it's a Pascal GPU :slightly_smiling_face:

Nanite works on older GPUs because it has fallbacks for GPUs not supporting mesh shaders.

Calinou avatar Feb 21 '23 15:02 Calinou

@myaaaaaaaaa

[...]

I like the sound of this. I like the idea of not only trying to replicate a system like Nanite but instead trying to build on it and the things we've learned and coming up with a solution that avoids some of its pitfalls.

and-rad avatar Feb 22 '23 12:02 and-rad

This is one of the things that has always made me skeptical of the stance most open-source engines have taken that components such as terrain, terrain editor, foliage etc. are not / should not be part of the core.

I think for large worlds to work properly, there's a number of components that have to work together. First you need to be able to stream textures and meshes in and out of memory. Second you need to be able to be able to break huge worlds up into multiple chunks - you can't just have one huge scene file that's tens of GB is size, so you need to have scenes that are composed of multiple sub-scenes that are streamed in and out as you move around. Your scene format and editor needs to understand all that. Next your terrain system, foliage etc. needs to work together with that, so that chunks of terrain/foliage are stored in different world chunks. Finally, your editor needs to understand it all - so it doesn't try to load the whole world at once and lag the editor. It needs to be able to stream the world in and out as you move around, and save the trees and grass you paint into the world in the right chunks. It also needs to generate impostors for your trees etc. at import time and manage them along with mesh LODs.

It's all very inter-related I think, and the reason all the big AAA engines have these things as core components.

darrylryan avatar Mar 28 '23 10:03 darrylryan

I think for large worlds to work properly, there's a number of components that have to work together. First you need to be able to stream textures and meshes in and out of memory. Second you need to be able to be able to break huge worlds up into multiple chunks - you can't just have one huge scene file that's tens of GB is size, so you need to have scenes that are composed of multiple sub-scenes that are streamed in and out as you move around. Your scene format and editor needs to understand all that. Next your terrain system, foliage etc. needs to work together with that, so that chunks of terrain/foliage are stored in different world chunks. Finally, your editor needs to understand it all - so it doesn't try to load the whole world at once and lag the editor. It needs to be able to stream the world in and out as you move around, and save the trees and grass you paint into the world in the right chunks. It also needs to generate impostors for your trees etc. at import time and manage them along with mesh LODs.

Wouldn't a sort of generic asset streaming system be more useful? It just have to allow streaming of any resources so it shouldn't be too complex.

atirut-w avatar Mar 29 '23 01:03 atirut-w

After some research on trying to expand the capabilities of this mesh streaming to have some sort of support for skeletal meshes by making that function once in the memory and reuse the same result on the rest. This is said in this thread by sebb from twitter(the guy that presentes a gpu driven renderer for assasin's creed unity, and the unity hybrid renderer senior director)
https://twitter.com/SebAaltonen/status/1403044403007078403

https://twitter.com/SebAaltonen/status/1403044018079080454

I recommend checking his tweets of gpu driven rendering and nanite as they are interesting and can give some ideas, as it seems this mesh streaming solution share some similarities. Edit' Also for small clusters you could use some techniques MM dreams used for merging them although the problem may come when there are overlapping objects.

Saul2022 avatar Apr 14 '23 14:04 Saul2022

It's a good idea to keep in mind that those tweets are almost two years old though. The alpha mask limitation is gone by now, although I don't know if they will ever support skinned meshes for Nanite. I don't even know if I would want them to. Rigging and skinning a mesh with a million verts sounds like an absolute nightmare to me.

and-rad avatar Apr 14 '23 15:04 and-rad

i know that they are gone and that it kind of old,. but i mentioned that tweet as a reference for godot mesh streaming solution and could apply to both godot mesh streaming and nanite(as juan said the major difference is nanite lod's are computed per piece, not per object). Also Skinning support doesn't just mean animation on millions of polygon, it means that you can use way more characters and also frees cpu to do more behavior things. This makes largue crowds easier to do although, proposals like swarms or just using c++ can helps.It was mentioned in the blog of godot for AAA games.

Saul2022 avatar Apr 14 '23 16:04 Saul2022

s * The aforementioned visibility notifier

Remember that the mesh shader should have a fallback for non RDNA 2, i don’t think it should be supported atm as only few will benefit from it, even if not as good, it gives a massive perfomance boost compared to auto lod, because of the cluster and two pass occlusion culling.

Saul2022 avatar May 23 '23 12:05 Saul2022

Non RDNA2+, Turing+ and Alchemist+, So AMD RX 6xxx+, Nvidia GTX 16xx+ and Intel Arc

mrjustaguy avatar Jun 18 '23 10:06 mrjustaguy

I have some questions: Will the initial approach support foliage that uses alpha like leaves, bush etc? And what will happen if there not enough chunks for the mesh to stream or the higher poly mesh don’t have them , will it dissapear too or will have the extra cost? For example with a 1k triangle sphere, could it use mesh streaming?

Saul2022 avatar Aug 11 '23 17:08 Saul2022