xemu
xemu copied to clipboard
nv2a: Depth buffer precision improvements and polygon offset slope factor
-
Use barycentric coordinates to interpolate depth values. Linux, Mesa and AMD Radeon RX 6600 with Vulkan driver currently has quite poor interpolation precision, see video and tests below. This PR replaces GPU interpolation with a manual one. Also note that current Xemu w-buffer interpolation uses gl_FragCoord.w which can't reproduce all w-values, e.g. 1.0f/16777046.0f equals 1.0f/16777047.0f in 32-bit floats and so inversion can't reproduce both 16777046.0f and 16777047.0f. Also, depth value differences are used in interpolation, which has the desired property that a triangle with the same z-value on all vertices will result in exactly that same z-value when interpolated. At least the game Shenmue II sky rendering relies on this #2049.
-
Computes polygon depth bias slope for both z-buffering and w-buffering. These are computed by taking the max and abs of partial derivatives of either of the functions z=z(x,y) or w=w(x,y), where x,y,z,w are screen-space coordinates. This matches Xbox hardware for z-buffering where the partial derivatives are constants over any fixed triangle. However, for w-buffering the partial derivatives vary over any fixed triangle, but Xbox appears to compute just a single depth slope at the first visible pixel (where "first" means something like first in top-left order) and uses that over the whole triangle. The way the PR computes the partial derivatives is by using the chain-rule, e.g. dw/dx = -w^2 * d(1/w)/dx. This is useful since 1/w is linear in screen-space and therefore d(1/w)/dx is constant over any fixed triangle. However, finding out the w-value for the first visible pixel of a triangle is difficult in OpenGL/Vulkan and is not done here. Instead we calculate depth slope per-pixel. Fixes #2041.
-
Since the depth buffer calculation modifications above require modifying geometry shader code and outputs, this PR also fixes some flat shading and polygon line mode bugs since keeping them as is with the new modifications would have been at least as much work.
Relevant additions to nxdk_pgraph_tests are here:
- Depth values: https://github.com/coldhex/nxdk_pgraph_tests/commit/7e6ca18c2b0707939224ac4f9450a91d44423713
- Shade model: https://github.com/coldhex/nxdk_pgraph_tests/commit/5bcdc6729da7b06febd08324a61b1486ca816d9d
I tested using Radeon RX 6600. The following video shows the depth precision issue (which is different from the depth slope z-fighting issue and is also fixed by PR) with Xemu 0.8.67 and Chronicles of Riddick (where depth value inaccuracy causes problems with shadow volume calculations):
https://github.com/user-attachments/assets/8e218f02-72d9-40b9-94a1-da8a254f81b6
Radeon OpenGL is a little better than Vulkan. (Btw, this inaccuracy issue doesn't occur with Intel UHD 770.) I reproduced one of the triangles in an nxdk_pgraph_test 24-bit integer W-buffering test. The following is with Vulkan:
OpenGL and Xemu 0.8.67 produce depth values closer to Xbox HW values, but not close enough as the video above shows. That was Radeon, but I also tested Xemu 0.8.67 on Intel UHD 770 and it matches Xbox HW in this test, so Intel's hardware interpolation seems to be of better quality than AMDs.
Regarding the new depth slope implementation for W-buffering, here are tests for the case of 24-bit integer W-buffer. These use slope factor 65536.0, which is much larger than games would typically use (e.g. Riddick uses 1.0), but it makes it possible to see how the implementation in PR doesn't produce exactly the correct values. Note how Xbox HW appears to compute the slope offset value only once at first visible pixel. The quad is exactly the same among ClipF tests and exactly the same among ClipW tests, only window clipping changes (on the red line):
(Also note in ClipW tests how the last W-value 0x30CB5 doesn't change. This is because Xbox draws the quad as two triangles and the last W-value belongs to the second triangle which doesn't need clipping and therefore its depth slope is unaffected.)
To further show curious Xbox HW behaviour, the following tests draw the same triangle in 1-pixel offsets. TriH tests draws the triangles from left to right with 1-pixel vertical drops. TriV tests draw from top to bottom with 1-pixel shifts to right. The cycling with period 4 is due to the 1-pixel shifts only.
From this, I'm guessing NV2A rounds the first pixel to some point in a 4x4 pixel block and samples the depth slope once for the whole triangle. Sampling once in a such a manner is rather difficult to implement in OpenGL/Vulkan graphics pipeline so PR doesn't even try and implements slope calculation per-pixel instead.
For Z-buffering, depth slope calculation in this PR is about the same as Xemu 0.8.67. However, depth offset improves since, for whatever reason, Mesa adds the offset doubled:
Depth offset for lines is fixed in this PR. This depends on if V_PGRAPH_SETUPRASTER_POFFSETLINEENABLE is set. Xemu 0.8.67 applies the offset if any POFFSET is enabled, so PR improves on that, e.g.:
For the flat shading and polygon line mode changes, here are test cases where PR makes improvements: (I include some test cases where also Xemu 0.8.67 matches HW)
Should fix. #2204 #2191 #2228 #2038 . Also fixed madden series but never created an issue for it. Thought this would solve the remaining z-fighting issues but seem to be unrelated. #1020
Also a couple games regressed from your previous pr, #2213. Also marvel vs capcom, metal slug 5 where also effected. In my extensive testing I never came across any issues unfortunately before pr was merged but hopefully you know now. Let me know if you want me to open an issues for it. Lastly keep up the great work.
@Triticum0 Yes, a screenshot of Metal Slug 5 would be nice since the Beat Down regression doesn't happen with my hardware.
Here you go. My guess it nivida exclusive I need a build renderdoc debuh build of lastest realese. once I do I see what I can do.
Metal slug 5
MvsC 2
Edit: Tested with your suggestion and seem to fix the issue.
Very cool, thanks! It may take me a few days to get time to review.
@coldhex If you have time might be worth look at the remain z-fighting issuses I know matt has Alias so you could start there or outrun 2 as z-fighting easily noticeable and could be related to this pr https://github.com/xemu-project/xemu/issues/1020
Thanks for the screenshots. Metal Slug 5 doesn't have such artifacts on Radeon (or Intel UHD 770). (On Beat Down issue, I commented that I now noticed there is an artifact present on Radeon, but less than on GeForce.) Looking at MS5 in renderdoc, it too renders the "Mission ..." text using rectangles at half-pixel coordinates and nearest-neighbor texture atlases, like Beat Down. So they have the same cause.
OutRun 2 has z-fighting on distant buildings even on Xbox HW. Seems to be caused by the usual low precision of z-buffering at distances far from the camera. This PR may (or may not) reduce the z-fighting to be closer to HW, but in general PR (or any Xemu version) does not produce exact HW z-buffer values. There may be small differences. Additionally, different floating-point rounding mode in NV2A and in modern GPUs cause differences in vertex program output already. In one nxdk pgraph test (one of texture shadow comparator), there is a +3 difference on vertex program z-coordinate output caused only by floating-point rounding mode (round-towards-zero vs round-to-nearest). To obtain exactly the same z-buffer values floating-point arithmetic would have to be exactly the same as in NV2A and also exactly the same interpolation algorithm and order of calculations would need to be followed (whatever those are.)
I tested Metal Slug 5 and Beat Down on my AMD RX 6600 GPU and I don't have those lines.
Should fix #2191
Another thing about the PR: since this uses geometry shaders for triangles, triangle strips and fans, it adds to the stop-the-world pauses when shaders are compiled. Since geometry shader code generation depends only on a relatively few variables such as primitive type and polygon mode, there aren't all that many geometry shader code variations. However, currently geometry shaders are compiled every time there is a new vertex or fragment shader. It might be worth to cache geometry shaders separately to avoid recompiling the same geometry shader code.
Fixes the z-fighting in madden series as well fixes z-fighting in Pariah look at the gun. https://www.youtube.com/watch?v=9ki3JeHnRKA&list=PLmPVsOuxTsQJpSGHGa1YFSWRqnUohZSrt&index=3
It might be worth to cache geometry shaders separately to avoid recompiling the same geometry shader code.
Agreed. ~~Vertex, geometry, and fragment shaders should probably be cached independently. We'll defer it for now~~ Done in #2324
@coldhex while I was testing some games I came across a where black triangle would cover most of the screen seem to stumble across it on multiple games and think I track it down to regression starting in PR: Perspective-correct interpolation for w-buffering it shows like this.
Bionicle - The Game
Carmen Sandiego: The Secret of the Stolen Drums 2262
ultimate beach soccer
Only found out it was a regression today as wasn't very reproducible as only show up randomly when loading into the level and fixed itself when exiting levels. Only Bionicle - The Game was reproduceable so could only test now I found it good test case.
No rush only metioning it as this pr doesn't fix the issue. Also simiar issue in Counter Terrorist Special Forces don't know if it also related
@coldhex while I was testing some games I came across a where black triangle would cover most of the screen seem to stumble across it on multiple games and think I track it down to regression starting in PR: Perspective-correct interpolation for w-buffering it shows like this.
Bionicle and Counter Terrorist Special Forces seem to give garbled input data to a vertex shader. For the latter game, when exporting vertex data bytes to a file from Renderdoc, the data even contained an English word in ascii in it even though it is supposed to be coordinates and RGB values. Bionicle seems to have corrupt data too and possibly always has, but the regression might be caused by a bug in this if-condition: https://github.com/xemu-project/xemu/blob/80f7efaba567a5c25c923e8df30ed5db17198f7a/hw/xbox/nv2a/pgraph/glsl/vsh.c#L214-L221
If the floating-point number t is a positive subnormal number (from garbled passthrough input data w-coordinate), then t>0.0 is false and so is the second condition and therefore t gets clamped using the negative range and sign is flipped. Flushing subnormals to zero is what GPUs do and this can be taken into account by replacing the if-condition with e.g. "floatBitsToInt(t) >= 0". I'll do a PR about this later. The garbled input data is a separate issue. I don't know if that is Xemu or bug in game itself.
Seem the bug with ultimate beach soccer fixed with this pr.
Here Shader winding for 30 series of graphic cards. VK geometry shader winding: 0, 0, 0, 0 GL geometry shader winding: 0, 0, 0, 0
Test Vega 8 on integrated graphics 7730U and also same as above.
Also got this on 3060 TI when looking at the log Could not determine triangle rotation, got color: R=0, G=129, B=190 Could not determine triangle strip0 rotation, got color: R=0, G=171, B=169 Could not determine triangle strip1 rotation, got color: R=84, G=171, B=86 Unexpected inconsistency in triangle fan winding, got colors: R=0, G=169, B=171 and R=84, G=86, B=171
Test was done just on xbox boot logo.
Also got this on 3060 TI when looking at the log Could not determine triangle rotation, got color: R=0, G=129, B=190 Could not determine triangle strip0 rotation, got color: R=0, G=171, B=169 Could not determine triangle strip1 rotation, got color: R=84, G=171, B=86 Unexpected inconsistency in triangle fan winding, got colors: R=0, G=169, B=171 and R=84, G=86, B=171
Thanks for testing! Which one is that, OpenGL or Vulkan? Those color values indicate that something is wrong the rendering. It's just supposed to render some triangles to an offscreen image and those are RGB values are invalid.
Opengl
I added error code checking and printing for OpenGL and also a glFinish call (which should be unnecessary), but nothing in particular sticks out. @Triticum0 could you try it again and see what error it gives?
The error code the same Could not determine triangle rotation, got color: R=0, G=129, B=190 Could not determine triangle strip0 rotation, got color: R=0, G=171, B=169 Could not determine triangle strip1 rotation, got color: R=84, G=171, B=86 Unexpected inconsistency in triangle fan winding, got colors: R=0, G=169, B=171 and R=84, G=86, B=171 GL geometry shader winding: 0, 0, 0, 0
Ok, thanks. Odd, those RGB values are what would be expected if the test geometry shader just passed through colors per vertex. I'll try to figure it out.
@Triticum0 Could you test with the latest commit? I added redundant computation to the test geometry shader which may fix this if it is a driver bug.
@coldhex it does fix the driver bug. no longer show up in the logs
here the log xemu.log
@Triticum0 Cool, thanks!
I applied the same fix also to the Vulkan geometry shader tester, just in case. I also updated related code comments. Furthermore, now a fresh OpenGL context is created for the tester so that it always runs with clean state.
Great, hopefully get merged soon, as had few game was going retest once it get merged. Worried when you put it back to draft after all you hard work. Thanks.
I added another commit to work around a Radeon OpenGL bug in Mesa. Geometry shader now emits separate triangles and line segments so that OpenGL/Vulkan implementation's provoking vertex choice doesn't matter for output primitives.
I found this bug in Mesa and created a gist that reproduces it on my system: https://gist.github.com/coldhex/1744021258fe0af686dbca5b2e5550c4
Expected output is that the right rectangle should have the same colors as the left one, but with Radeon the output is:
Fixes #2394 and #2393
Fixes #1155
Likely will fix #2479