OpenGothic icon indicating copy to clipboard operation
OpenGothic copied to clipboard

Low Raytracing performance on AMD

Open Try opened this issue 2 years ago • 23 comments

Noted on a stream here: https://youtu.be/PMP53yaC7n0?t=9370

First idea is to make use of VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_KHR and DX12 equivalent for opaque instances, yet exact reason why RX6600 is much-much slower than RTX3070 is unknown.

Try avatar May 08 '23 13:05 Try

Marking as help-wanted: need someone with RX6600(or other RT-capable AMD), to help with testing

Try avatar May 13 '23 18:05 Try

I have RX 6700 XT and could do the test. I don't really understand what's the issue in stream, though, is it about going indoors?

Nindaleth avatar May 15 '23 11:05 Nindaleth

Hi, @Nindaleth , nice that you have RX6700 :)

Yes, basically RT performance is dramatic. Since raytracing is used mostly for point-lights, indoor scenes takes most impact. My current suspicion is that AMD traversal algorithm is way slower for alpha-tested geometry.

Can you confirm that on your machine game also runs slower in indoor areas? If yes, that here is path to Tempest:

diff --git a/Engine/gapi/vulkan/vaccelerationstructure.cpp b/Engine/gapi/vulkan/vaccelerationstructure.cpp
index cd4944e..c4a3399 100644
--- a/Engine/gapi/vulkan/vaccelerationstructure.cpp
+++ b/Engine/gapi/vulkan/vaccelerationstructure.cpp
@@ -163,7 +163,7 @@ VTopAccelerationStructure::VTopAccelerationStructure(VDevice& dx, const RtInstan
     objInstance.instanceCustomIndex                    = inst[i].id;
     objInstance.mask                                   = 0xFF;
     objInstance.instanceShaderBindingTableRecordOffset = 0;
-    objInstance.flags                                  = 0; // none
+    objInstance.flags                                  = VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_KHR;
     objInstance.accelerationStructureReference         = blas->toDeviceAddress(dx);

     pBuf.handler->update(&objInstance,i, 1,sizeof(objInstance), sizeof(objInstance));

^ This will make every-object opaque, and will help to estimate potential FPS grow.

Try avatar May 15 '23 17:05 Try

Hi @Try :-) Actually yes, I've been getting the indoors slowdown for a very long time and just assumed everybody gets that so it's known :-D I never got around to measure it properly to make the issue detailed enough.

For a quick test of the patch, Standing in front of the Khorinis hotel I get around 72 FPS, going straight in until I stop at Hannah's desk I then get around 24 FPS. After applying the patch, unfortunately I don't see any difference in FPS nor in the visuals. Should something look much different? All Extended Configuration options are enabled, there are no special switches used when running the game.

Nindaleth avatar May 15 '23 19:05 Nindaleth

Should something look much different?

Alpha-tested geometry should break, in RT-shaders. You can verify this by testing grass against torch.

All Extended Configuration options are enabled

RT is not a menu option. Game simply enables it for discrete gpu's with RT-support, assuming that they are capable of simple shadows. To test RT on/off you can use command line flags -rt 0

Also for, if you interested to have deep dive into shader: shader/lighting/light.frag, there is a function bool isShadow(...), that implements simple ray-test loop.

Actually... gl_RayFlagsNoOpaqueEXT is wrong there, need to remove this as well, in order to make path usefull. Sorry found it, as been writing this post :D

Try avatar May 15 '23 20:05 Try

RT is not a menu option.

You're right, sorry, I mentioned the extended configuration in case the raytracing worked specially bad with some of the settings, performance-wise.

Actually... gl_RayFlagsNoOpaqueEXT is wrong there, need to remove this as well, in order to make path usefull. Sorry found it, as been writing this post :D

Alright, I'm glad my lack of difference is expected :-D

Retested with the light.frag changed too. The torch+grass now looks bad so it works, but I still get the same 72 -> 24 FPS drop in Hanna's hotel test.

Nindaleth avatar May 16 '23 20:05 Nindaleth

72 -> 24 FPS drop

Eh, that's not good at all. 72 is RT+outdoor, right? What FPS in non-rt case?

PS Decided to commit e5a5fee with proper RT-flags. At least now engine matches AMD guideline...

Try avatar May 16 '23 20:05 Try

Thanks, I've pulled the new changes. If you'd like me to test any patches on top of the current master, let me know what to try.

As for the testing, this is the outdoor part

test_part1_rton

And this is the indoor part test_part2_rton

On non-RT the FPS is around the same and there's no change when moving indoors. Roughly:

  • RT+outdoor = ~70 FPS
  • RT+indoor = ~25 FPS
  • -rt 0 + outdoor = ~70 FPS
  • -rt 0 + indoor = ~70 FPS

Nindaleth avatar May 16 '23 21:05 Nindaleth

Did you consider running a GoFundMe or similar so that you're able to purchase your own RT-supporting Radeon and debug this? :D

Nindaleth avatar Jun 22 '23 10:06 Nindaleth

Did you consider running a GoFundMe or similar so that you're able to purchase your own RT-supporting Radeon and debug this?

This is not about the money :D I have 3 laptops, 2 of them are useless. Do not want to but yet-another-one)

Try avatar Jun 22 '23 16:06 Try

Oh OK, I see and agree. I prefer a desktop so it would be a question of swapping just the GPU, no further purchases needed, but laptop form factor makes that part harder.

Nindaleth avatar Jun 22 '23 20:06 Nindaleth

Just for the record: изображение

Polygon complexity is pretty-much non-existing, everything is opaque. For some reason NSight show that none of acceleration-structures are FAST_TRACE (despite that the only mode engine actually supports), also none are FAST_BUILD. Maybe application bug, maybe NVidia bug here.

Amount of rays in scene is very high - tom of light-sources, on both floor of the building.

Try avatar Jun 24 '23 14:06 Try

Here's an interesting part on a blogpost about the raytracing in RADV driver (what I use as a Radeon Vulkan driver on Linux):

How well do ray queries run? Pretty competitive with AMDVLK/the AMD Windows drivers! You’ll generally see similar, if not better, performance on RADV.

How well do pipelines run? Not well (expect significantly less performance compared to AMDVLK/Windows drivers). This is being worked on.

Are these RT pipelines something that's used OpenGothic? It would explain the less-than-expected performance.

Nindaleth avatar Jun 24 '23 19:06 Nindaleth

Are these RT pipelines something that's used OpenGothic?

No, only rayQuery. There is a better resource explaining how they do thing: https://gpuopen.com/learn/improving-rt-perf-with-rra/

Overall, while exact algorithm that driver uses for traversal is not specified, probably it's sort-of tree with rope-nodes or similar. What is quite bad for games, as such algorithm is set to work well on a few 3D models with high poly count, but not on large amount of low-poly.

In theory, it's possible to merge all static object into one giant mesh, to get some extra performance. Or we can wait for RTX-2.0 api, and see if it fixes anything

Try avatar Jun 24 '23 20:06 Try

Mac M1 Снимок экрана 2023-07-03 в 20 26 25

Well... could be worse :D

Try avatar Jul 03 '23 18:07 Try

Well... could be worse :D

Could be 2 FPS less still :D What FPS do you get on your RTX in the hotel?

Nindaleth avatar Jul 03 '23 18:07 Nindaleth

Actually not that great: 41 fps. And RTCORE throughput is dominating frame: изображение

also from XCode profiler(fragment overdraw) can see that frame uses ~11 rays per pixel, what is not a lot to be fair.

Try avatar Jul 03 '23 19:07 Try

OK, that's at least already getting fluid. I just thought about testing in 1080p. Those 72/24 FPS values on RX 6700 XT are in 1440p, in 1080p I get around 80/32 FPS. I guess except for the most powerful cards the FSR2 or another kind of upscaling is still needed with RT-powered content in high resolutions.

Nindaleth avatar Jul 03 '23 19:07 Nindaleth

Few ideas:

  1. Remove some light-sources from RT, based on some criteria:
--- a/game/world/worldlight.cpp
+++ b/game/world/worldlight.cpp
@@ -4,7 +4,8 @@

 WorldLight::WorldLight(Vob* parent, World& world, const phoenix::vobs::light& vob, Flags flags)
   : Vob(parent,world,vob,flags) {
-  light = LightGroup::Light(world, vob);
+  if(vob.quality!=phoenix::light_quality::medium)
+    light = LightGroup::Light(world, vob);
   }

 void WorldLight::moveEvent() {

path above results in 60+ fps, unfortunately quality criteria is semi-random 2. Merge multiple BLAS into one big supper-BLAS. cons: ugly; relies on how RT might be implemented on AMD/NVidia; no way to predict gains 3. Virtual shadow map cons: one of the hardest raster-technique to implement; no MacOS

Try avatar Jul 03 '23 20:07 Try

Not sure what's the specific cause, but on the latest (74497771) I'm getting something like 70/30 FPS (instead of 72/24 previously).

I wanted to report getting even 45 FPS in the hotel indoors instead of 30 FPS, but it turns out the new resolution slider was set to 75% :D

Nindaleth avatar Sep 26 '23 19:09 Nindaleth

Current measurement of the same scene on latest cc35d6df (still the same hardware, still 1440p and 100% render resolution, all extended graphics features enabled): 80/35 FPS. For comparison, I get 105/57 FPS in 1080p. Not sure how it could improve so significantly, but I'm happy about the improvements.

Nindaleth avatar Nov 20 '23 21:11 Nindaleth

Hm, there were some optimization commits in recent time:

  • 390490c0
  • 8bbf61c1
  • 8064c7c1
  • de8fcb97
  • 1173a3b8

Still bit too much to explain the difference... or maybe magic amd-driver update :D

Try avatar Nov 20 '23 22:11 Try

I wish I was making this up, today I tested an RT optimization that landed a few days ago in Mesa. :smile:

From September's 70/30 to November's 80/35 to January's 93/40 FPS, the RT perf is continuously improving thanks to the work of all of you. That's on 1440p, on 1080p the increase from November's 105/57 is to 127/63 FPS.

Nindaleth avatar Jan 09 '24 17:01 Nindaleth