Virtual shadowmap
In general, I'm looking for a new, more sophisticated, solution to replace projective shadowmap.
Goals
- scaling with resolution (faster on low-res rendering)
- single solution for any GPU (unlike ray-tracing)
- high (pixel-perfect) quality
Progress
- [x] DirectLighting (sun)
- [x] Planetary occlusion, for sun
- [x] Fog
- [ ] Optimized fog
- [x] Point lights
- [x] Refactor/cleanup: rename
*cluster_taskinto something readale - [ ] Overall optimization
- [ ] small-core (512 threadgroup) support
interesting to experiment:
- [ ] software-rt (with meshelets binned by pages)
- [ ] software pre-trace, to complement culling. Similar to RT, but without ray-triangle test.
Known solutions
Not a lot actually... nanite (page 119): https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf assassins creed (page 55): https://advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf https://ktstephano.github.io/rendering/stratusgfx/svsm https://www.cse.chalmers.se/~uffe/ClusteredWithShadows.pdf https://www.gamedevs.org/uploads/efficient-shadows-from-many-lights.pdf
not vsm, but close enough in concept: http://lukaskalbertodt.github.io/2023/11/18/tiled-soft-shadow-volumes.html
Very first WIP
Some page-data to illustrate
Initial implementation
Decide to start with common parameters for now:
- only sun-light for now
- 4k*4k page atlas
- 128x128 page size: up to 1024 pages
- Clipmaps + page table 64x64x32
- Meshlets are duplicated, as many time as many pages they overlap
Current considerations:
- Cluster culling: cull all clusters versus all pages is expensive
1.1 Coarse culling not possible - need to output exact
pageId, for visible meshlets 1.2 Output size is not deterministic and no good way to react to out-of-memory - Cull versus clip-map (HiZ like) is easier, than versus each page individually
2.1 Won't be able to use hw-rasterizer to output data (need to use image-atomics + image-less rendering)
2.2 Image-less rendering limited by
maxViewportDimensions = 4k - Software renderer?! 3.1 Immediate one still requires atomics, and wont be better than render-pass based one 3.2 Tile-base can be an interesting take, but not valid without bindless
- Requires some complementary solution to work with volumetrics
@Try, Does this mean OpenGothic will work on weaker graphics cards ?
@YALdysse by 'weaker' graphics card you mean weaker than what?)
@YALdysse by 'weaker' graphics card you mean weaker than what?)
OpenGothic consumes an average of 95% of my graphics card - AMD Radeon RX Vega 7 (CPU - AMD Ryzen 5500U).
OpenGothic consumes an average of 95% of my graphics card -
AMD Radeon RX Vega 7(CPU - AMD Ryzen 5500U).
Unfortunately, we are not talking about stable 60 FPS, but I can play.
OK, I would like to avoid setting any expectations, as virtual-shadow is pretty-much experimental tech.
AMD Radeon RX Vega 7
With Vega there are 2 major issues:
- It's way to different to any other gpu and hard to reason about what is going on in this black box
- it have no mesh-shader. So engine have to fallback to non-indexed draws, what about 3x more expesive
Some numbers [RTX3070]:
Protective shadowmap (current solution):
7836meshlets on tested scene (see starting post)- 0.16ms
Virtual shadow
408pages43392meshlets (shorten shadow range; meshlets are duplicated for each page)- Meshlets in page:
111=27of paladin with gear + landscape - Render time of single page: 0.12ms
- most of rendering time is
PROP(pre-rasterization work in HW)
- most of rendering time is
Numbers for one of smaller pages, close to the camera:
clip-distance helps with fragment workload, bringing adequate FPS (still slower than regular SM)
Some rendering examples:
City:
Ship:
Even with large pages look is not good:
vsm.header.pageCount 604
vsm.header.meshletCount 91656 // total to draw for VSM
vsm.header.counterM 10787 // total amount of unique meshlets for VSM, meaning duplication factor is ~x9
vsm.header.counterV 8733 // meshlets that a drawn in case of good-old shadowmap
non-empty shadow mips: 8
Testing area:
Non related, just cool screens:
Testing is now enabled by command line: -vsm 1
some data on culling:
// baseline
vsm.header.pageCount 546
vsm.header.meshletCount 38316
vsm.header.counterM 13753
vsm.header.counterV 8733
// cull dummy tiles in larger page
vsm.header.pageCount 560
vsm.header.meshletCount 34514
vsm.header.counterM 0
vsm.header.counterV 8733
// hiz
vsm.header.pageCount 558
vsm.header.meshletCount 29547
vsm.header.counterM 0
vsm.header.counterV 8733
About 23% reduction in meshlet count. Now runtime is about 1ms for rendering.
vsm.header.counterV 8733
Amount of meshlets for protective shadow-map. So we still using about 4x meshlets, relative to projective sm
fog + vsm doesn't quite work (non-resident pages):
experimenting with vsm-fog:
Some frame case-study on draw-amount. (note: I do prefer to measure amount of meshlets, instead of draw-time as it doesn't depend on gpu-model/temperature/etc)
projectiveSM: 8.7k virtualSM: 20.6k
Now breakdown per clipmap:
mip: all, land, obj
0 1041 590 455
1 1516 784 732
2 1263 603 660
-
3 2152 996 1156
4 2516 1238 1282
5 4731 1016 3715
6 3900 865 3035
-
7 3148 740 2408
8 156 82 78
Fog adds roughly another 5k to mip5 (and a bit to others)
fog is back to reasonable timings: 1.51ms -> 0.57ms. However comes at cost of some flicker, that need to be worked on
L'Hiver. Nice details on individual rocks, ~30% gpu load - similar to vanilla game.
Some progress on volumetric-fog. I wouldn't call it solved, yet pretty good already:
Undersampling artifacts at shafts far from camera:
Omnidirectional lights shadowing in progress
Experimenting with software ray-tracing:
Basic idea:
- Hardware(rasterization) based shadowmap has way too many bottlenecks, specially on old hardware or tile-based gpu's
- on MacOS and Vega we have to emulate mesh-shader with 3x more polygons
- There are only a few triangles per pixel to test - just need to find them well.
- Software renderer can do irregular sampling easily, achieving RT-like quality
- Meshlet support is also trivial
- With only a little tweaks, can support alpha-blended materials such as trees and smoke
- painted glass is a stretch goal
Cons: my implementation requires position-buffer (similar to geometry-cache in some games) and we can not have dynamic memory. Hopefully 32mb is enough for everyone :)
Pebbles 2.0
Debug view of primitive bins:
Depth splices are quite a killers: 4k triangles instead of 60'sh
First screen-shoot of point-light shadow:
Many-lights for point-lights:
Unfortunately meshlet culling is pretty-much useless for point light, making primitive phase too expensive
Runs generally well on PC now, however on MacOS occupancy close to zero: