OpenGothic icon indicating copy to clipboard operation
OpenGothic copied to clipboard

Virtual shadowmap

Open Try opened this issue 1 year ago • 27 comments

In general, I'm looking for a new, more sophisticated, solution to replace projective shadowmap.

Goals

  • scaling with resolution (faster on low-res rendering)
  • single solution for any GPU (unlike ray-tracing)
  • high (pixel-perfect) quality

Progress

  • [x] DirectLighting (sun)
    • [x] Planetary occlusion, for sun
  • [x] Fog
    • [ ] Optimized fog
  • [x] Point lights
  • [x] Refactor/cleanup: rename *cluster_task into something readale
  • [ ] Overall optimization
    • [ ] small-core (512 threadgroup) support
interesting to experiment:
  • [ ] software-rt (with meshelets binned by pages)
  • [ ] software pre-trace, to complement culling. Similar to RT, but without ray-triangle test.

Known solutions

Not a lot actually... nanite (page 119): https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf assassins creed (page 55): https://advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf https://ktstephano.github.io/rendering/stratusgfx/svsm https://www.cse.chalmers.se/~uffe/ClusteredWithShadows.pdf https://www.gamedevs.org/uploads/efficient-shadows-from-many-lights.pdf

not vsm, but close enough in concept: http://lukaskalbertodt.github.io/2023/11/18/tiled-soft-shadow-volumes.html

Very first WIP

изображение изображение

Some page-data to illustrate изображение

Initial implementation

Decide to start with common parameters for now:

  • only sun-light for now
  • 4k*4k page atlas
  • 128x128 page size: up to 1024 pages
  • Clipmaps + page table 64x64x32
  • Meshlets are duplicated, as many time as many pages they overlap

Current considerations:

  1. Cluster culling: cull all clusters versus all pages is expensive 1.1 Coarse culling not possible - need to output exact pageId, for visible meshlets 1.2 Output size is not deterministic and no good way to react to out-of-memory
  2. Cull versus clip-map (HiZ like) is easier, than versus each page individually 2.1 Won't be able to use hw-rasterizer to output data (need to use image-atomics + image-less rendering) 2.2 Image-less rendering limited by maxViewportDimensions = 4k
  3. Software renderer?! 3.1 Immediate one still requires atomics, and wont be better than render-pass based one 3.2 Tile-base can be an interesting take, but not valid without bindless
  4. Requires some complementary solution to work with volumetrics

Try avatar Sep 05 '24 21:09 Try

@Try, Does this mean OpenGothic will work on weaker graphics cards ?

YALdysse avatar Sep 06 '24 04:09 YALdysse

@YALdysse by 'weaker' graphics card you mean weaker than what?)

Try avatar Sep 06 '24 15:09 Try

@YALdysse by 'weaker' graphics card you mean weaker than what?)

OpenGothic consumes an average of 95% of my graphics card - AMD Radeon RX Vega 7 (CPU - AMD Ryzen 5500U).

YALdysse avatar Sep 06 '24 15:09 YALdysse

OpenGothic consumes an average of 95% of my graphics card - AMD Radeon RX Vega 7 (CPU - AMD Ryzen 5500U).

Unfortunately, we are not talking about stable 60 FPS, but I can play.

YALdysse avatar Sep 06 '24 15:09 YALdysse

OK, I would like to avoid setting any expectations, as virtual-shadow is pretty-much experimental tech.

AMD Radeon RX Vega 7

With Vega there are 2 major issues:

  • It's way to different to any other gpu and hard to reason about what is going on in this black box
  • it have no mesh-shader. So engine have to fallback to non-indexed draws, what about 3x more expesive

Try avatar Sep 06 '24 19:09 Try

Some numbers [RTX3070]:

Protective shadowmap (current solution):

  • 7836 meshlets on tested scene (see starting post)
  • 0.16ms

Virtual shadow

  • 408 pages
  • 43392 meshlets (shorten shadow range; meshlets are duplicated for each page)
  • Meshlets in page: 111 = 27 of paladin with gear + landscape
  • Render time of single page: 0.12ms
    • most of rendering time is PROP (pre-rasterization work in HW)

Numbers for one of smaller pages, close to the camera: изображение

Try avatar Sep 09 '24 20:09 Try

clip-distance helps with fragment workload, bringing adequate FPS (still slower than regular SM)

изображение

Try avatar Sep 10 '24 19:09 Try

Some rendering examples: изображение изображение изображение

Try avatar Sep 10 '24 19:09 Try

City: изображение

Ship: изображение

Try avatar Sep 10 '24 22:09 Try

Even with large pages look is not good:

vsm.header.pageCount	604
vsm.header.meshletCount	91656 // total to draw for VSM
vsm.header.counterM	10787 // total amount of unique meshlets for VSM, meaning duplication factor is ~x9
vsm.header.counterV	 8733 // meshlets that a drawn in case of good-old shadowmap

non-empty shadow mips:  8

Testing area: изображение

Non related, just cool screens: изображение изображение изображение

Try avatar Sep 15 '24 22:09 Try

Testing is now enabled by command line: -vsm 1

Try avatar Sep 15 '24 22:09 Try

some data on culling:

// baseline
vsm.header.pageCount	546
vsm.header.meshletCount	38316
vsm.header.counterM	13753
vsm.header.counterV	8733

// cull dummy tiles in larger page
vsm.header.pageCount	560
vsm.header.meshletCount	34514
vsm.header.counterM	0
vsm.header.counterV	8733

// hiz
vsm.header.pageCount	558
vsm.header.meshletCount	29547
vsm.header.counterM	0
vsm.header.counterV	8733

About 23% reduction in meshlet count. Now runtime is about 1ms for rendering.

vsm.header.counterV 8733

Amount of meshlets for protective shadow-map. So we still using about 4x meshlets, relative to projective sm

Try avatar Oct 01 '24 20:10 Try

fog + vsm doesn't quite work (non-resident pages): изображение

Try avatar Oct 02 '24 18:10 Try

experimenting with vsm-fog: изображение

Try avatar Oct 04 '24 00:10 Try

Some frame case-study on draw-amount. (note: I do prefer to measure amount of meshlets, instead of draw-time as it doesn't depend on gpu-model/temperature/etc)

projectiveSM: 8.7k virtualSM: 20.6k

Now breakdown per clipmap:

mip:  all, land, obj
 0  1041   590   455
 1  1516   784   732
 2  1263   603   660
 -
 3  2152   996  1156
 4  2516  1238  1282
 5  4731  1016  3715
 6  3900   865  3035
 -
 7  3148   740  2408
 8  156     82    78

Chart Title(1)

Fog adds roughly another 5k to mip5 (and a bit to others)

Try avatar Oct 15 '24 20:10 Try

vsm: epipolar fog in progress

fog is back to reasonable timings: 1.51ms -> 0.57ms. However comes at cost of some flicker, that need to be worked on

Try avatar Nov 04 '24 19:11 Try

L'Hiver. Nice details on individual rocks, ~30% gpu load - similar to vanilla game.

изображение

Try avatar Nov 10 '24 18:11 Try

Some progress on volumetric-fog. I wouldn't call it solved, yet pretty good already: изображение

изображение

Undersampling artifacts at shafts far from camera: изображение

Try avatar Nov 25 '24 17:11 Try

Omnidirectional lights shadowing in progress изображение изображение

Try avatar Dec 05 '24 02:12 Try

Experimenting with software ray-tracing:

Image

Basic idea:

  • Hardware(rasterization) based shadowmap has way too many bottlenecks, specially on old hardware or tile-based gpu's
    • on MacOS and Vega we have to emulate mesh-shader with 3x more polygons
  • There are only a few triangles per pixel to test - just need to find them well.
  • Software renderer can do irregular sampling easily, achieving RT-like quality
    • Meshlet support is also trivial
  • With only a little tweaks, can support alpha-blended materials such as trees and smoke
    • painted glass is a stretch goal

Cons: my implementation requires position-buffer (similar to geometry-cache in some games) and we can not have dynamic memory. Hopefully 32mb is enough for everyone :)

Try avatar Feb 24 '25 21:02 Try

Image

Image

Try avatar Feb 24 '25 23:02 Try

Image

Try avatar Feb 25 '25 23:02 Try

Pebbles 2.0

Image

Try avatar Mar 01 '25 15:03 Try

Debug view of primitive bins:

Image

Depth splices are quite a killers: 4k triangles instead of 60'sh

Try avatar Mar 25 '25 20:03 Try

First screen-shoot of point-light shadow: Image

Image

Try avatar May 13 '25 22:05 Try

Many-lights for point-lights: Image

Unfortunately meshlet culling is pretty-much useless for point light, making primitive phase too expensive

Try avatar May 25 '25 23:05 Try

Runs generally well on PC now, however on MacOS occupancy close to zero:

Image

Try avatar Jun 01 '25 21:06 Try