Low Raytracing performance on AMD
Noted on a stream here: https://youtu.be/PMP53yaC7n0?t=9370
First idea is to make use of VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_KHR and DX12 equivalent for opaque instances, yet exact reason why RX6600 is much-much slower than RTX3070 is unknown.
Marking as help-wanted: need someone with RX6600(or other RT-capable AMD), to help with testing
I have RX 6700 XT and could do the test. I don't really understand what's the issue in stream, though, is it about going indoors?
Hi, @Nindaleth , nice that you have RX6700 :)
Yes, basically RT performance is dramatic. Since raytracing is used mostly for point-lights, indoor scenes takes most impact. My current suspicion is that AMD traversal algorithm is way slower for alpha-tested geometry.
Can you confirm that on your machine game also runs slower in indoor areas? If yes, that here is path to Tempest:
diff --git a/Engine/gapi/vulkan/vaccelerationstructure.cpp b/Engine/gapi/vulkan/vaccelerationstructure.cpp
index cd4944e..c4a3399 100644
--- a/Engine/gapi/vulkan/vaccelerationstructure.cpp
+++ b/Engine/gapi/vulkan/vaccelerationstructure.cpp
@@ -163,7 +163,7 @@ VTopAccelerationStructure::VTopAccelerationStructure(VDevice& dx, const RtInstan
objInstance.instanceCustomIndex = inst[i].id;
objInstance.mask = 0xFF;
objInstance.instanceShaderBindingTableRecordOffset = 0;
- objInstance.flags = 0; // none
+ objInstance.flags = VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_KHR;
objInstance.accelerationStructureReference = blas->toDeviceAddress(dx);
pBuf.handler->update(&objInstance,i, 1,sizeof(objInstance), sizeof(objInstance));
^ This will make every-object opaque, and will help to estimate potential FPS grow.
Hi @Try :-) Actually yes, I've been getting the indoors slowdown for a very long time and just assumed everybody gets that so it's known :-D I never got around to measure it properly to make the issue detailed enough.
For a quick test of the patch, Standing in front of the Khorinis hotel I get around 72 FPS, going straight in until I stop at Hannah's desk I then get around 24 FPS. After applying the patch, unfortunately I don't see any difference in FPS nor in the visuals. Should something look much different? All Extended Configuration options are enabled, there are no special switches used when running the game.
Should something look much different?
Alpha-tested geometry should break, in RT-shaders. You can verify this by testing grass against torch.
All Extended Configuration options are enabled
RT is not a menu option. Game simply enables it for discrete gpu's with RT-support, assuming that they are capable of simple shadows.
To test RT on/off you can use command line flags -rt 0
Also for, if you interested to have deep dive into shader: shader/lighting/light.frag, there is a function bool isShadow(...), that implements simple ray-test loop.
Actually... gl_RayFlagsNoOpaqueEXT is wrong there, need to remove this as well, in order to make path usefull. Sorry found it, as been writing this post :D
RT is not a menu option.
You're right, sorry, I mentioned the extended configuration in case the raytracing worked specially bad with some of the settings, performance-wise.
Actually...
gl_RayFlagsNoOpaqueEXTis wrong there, need to remove this as well, in order to make path usefull. Sorry found it, as been writing this post :D
Alright, I'm glad my lack of difference is expected :-D
Retested with the light.frag changed too. The torch+grass now looks bad so it works, but I still get the same 72 -> 24 FPS drop in Hanna's hotel test.
72 -> 24 FPS drop
Eh, that's not good at all. 72 is RT+outdoor, right? What FPS in non-rt case?
PS Decided to commit e5a5fee with proper RT-flags. At least now engine matches AMD guideline...
Thanks, I've pulled the new changes. If you'd like me to test any patches on top of the current master, let me know what to try.
As for the testing, this is the outdoor part
And this is the indoor part
On non-RT the FPS is around the same and there's no change when moving indoors. Roughly:
- RT+outdoor = ~70 FPS
- RT+indoor = ~25 FPS
- -rt 0 + outdoor = ~70 FPS
- -rt 0 + indoor = ~70 FPS
Did you consider running a GoFundMe or similar so that you're able to purchase your own RT-supporting Radeon and debug this? :D
Did you consider running a GoFundMe or similar so that you're able to purchase your own RT-supporting Radeon and debug this?
This is not about the money :D I have 3 laptops, 2 of them are useless. Do not want to but yet-another-one)
Oh OK, I see and agree. I prefer a desktop so it would be a question of swapping just the GPU, no further purchases needed, but laptop form factor makes that part harder.
Just for the record:
Polygon complexity is pretty-much non-existing, everything is opaque.
For some reason NSight show that none of acceleration-structures are FAST_TRACE (despite that the only mode engine actually supports), also none are FAST_BUILD. Maybe application bug, maybe NVidia bug here.
Amount of rays in scene is very high - tom of light-sources, on both floor of the building.
Here's an interesting part on a blogpost about the raytracing in RADV driver (what I use as a Radeon Vulkan driver on Linux):
How well do ray queries run? Pretty competitive with AMDVLK/the AMD Windows drivers! You’ll generally see similar, if not better, performance on RADV.
How well do pipelines run? Not well (expect significantly less performance compared to AMDVLK/Windows drivers). This is being worked on.
Are these RT pipelines something that's used OpenGothic? It would explain the less-than-expected performance.
Are these RT pipelines something that's used OpenGothic?
No, only rayQuery. There is a better resource explaining how they do thing: https://gpuopen.com/learn/improving-rt-perf-with-rra/
Overall, while exact algorithm that driver uses for traversal is not specified, probably it's sort-of tree with rope-nodes or similar. What is quite bad for games, as such algorithm is set to work well on a few 3D models with high poly count, but not on large amount of low-poly.
In theory, it's possible to merge all static object into one giant mesh, to get some extra performance. Or we can wait for RTX-2.0 api, and see if it fixes anything
Mac M1
Well... could be worse :D
Well... could be worse :D
Could be 2 FPS less still :D What FPS do you get on your RTX in the hotel?
Actually not that great: 41 fps. And RTCORE throughput is dominating frame:
also from XCode profiler(fragment overdraw) can see that frame uses ~11 rays per pixel, what is not a lot to be fair.
OK, that's at least already getting fluid. I just thought about testing in 1080p. Those 72/24 FPS values on RX 6700 XT are in 1440p, in 1080p I get around 80/32 FPS. I guess except for the most powerful cards the FSR2 or another kind of upscaling is still needed with RT-powered content in high resolutions.
Few ideas:
- Remove some light-sources from RT, based on some criteria:
--- a/game/world/worldlight.cpp
+++ b/game/world/worldlight.cpp
@@ -4,7 +4,8 @@
WorldLight::WorldLight(Vob* parent, World& world, const phoenix::vobs::light& vob, Flags flags)
: Vob(parent,world,vob,flags) {
- light = LightGroup::Light(world, vob);
+ if(vob.quality!=phoenix::light_quality::medium)
+ light = LightGroup::Light(world, vob);
}
void WorldLight::moveEvent() {
path above results in 60+ fps, unfortunately quality criteria is semi-random
2. Merge multiple BLAS into one big supper-BLAS.
cons: ugly; relies on how RT might be implemented on AMD/NVidia; no way to predict gains
3. Virtual shadow map
cons: one of the hardest raster-technique to implement; no MacOS
Not sure what's the specific cause, but on the latest (74497771) I'm getting something like 70/30 FPS (instead of 72/24 previously).
I wanted to report getting even 45 FPS in the hotel indoors instead of 30 FPS, but it turns out the new resolution slider was set to 75% :D
Current measurement of the same scene on latest cc35d6df (still the same hardware, still 1440p and 100% render resolution, all extended graphics features enabled): 80/35 FPS. For comparison, I get 105/57 FPS in 1080p. Not sure how it could improve so significantly, but I'm happy about the improvements.
Hm, there were some optimization commits in recent time:
- 390490c0
- 8bbf61c1
- 8064c7c1
- de8fcb97
- 1173a3b8
Still bit too much to explain the difference... or maybe magic amd-driver update :D
I wish I was making this up, today I tested an RT optimization that landed a few days ago in Mesa. :smile:
From September's 70/30 to November's 80/35 to January's 93/40 FPS, the RT perf is continuously improving thanks to the work of all of you. That's on 1440p, on 1080p the increase from November's 105/57 is to 127/63 FPS.