ppsspp icon indicating copy to clipboard operation
ppsspp copied to clipboard

Lens flare effects

Open hrydgard opened this issue 2 years ago • 11 comments

Lens flare issues, categorized:

Depth buffer read using CPU:

  • #13344 Wipeout Pure

Framebuffer->CLUT tricks

  • #11100 Burnout Dominator

Framebuffer alpha accumulation tricks:

  • #16083 Ridge Racers's car spotlight glare effect, sunset glare

Not yet investigated in detail:

  • #15071 Socom US Navy Seal: Tactical Strike
  • #15785 Split/Second
  • #10229 Syphon Filter Dark Mirror (light flares visible through buildings)
  • https://github.com/hrydgard/ppsspp/issues/10229#issuecomment-1232151181 Syphon Filter Logan's Shadow
  • #7810 Colin McRae 2005? (the sun is rendered wrong)
  • https://github.com/hrydgard/ppsspp/issues/11100#issuecomment-1123170383 Motorstorm: Artic Edge
  • https://github.com/hrydgard/ppsspp/issues/11100#issuecomment-1123377372 NFS Shift (there are more)!

References:

https://github.com/hrydgard/ppsspp/commits/c3bb9437669a4a (old PR for framebuffer CLUTs)

Lens flares are a typical problematic effect on GPUs of the PSP's generation. They are supposed to be drawn only when the sun (or other light source) is visible, but there are no occlusion queries you can use to figure out if it is directly on the GPU, neither is it practical to copy the texel to an image and then use multitexturing to blend the lens effect texture with the copied texel, since multitexturing is not a thing.

So games make use of a variety of dirty tricks.

Let's start with Wipeout Pure, #13344. I started by hacking the interpreter to log out CPU reads from VRAM. For some reason there are a whole bunch that happen every frame, but these stand out:

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdf0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdf4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdf8

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5ec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5f0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5f4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5f8

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdf0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdf4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdf8

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5ec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5f0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5f4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5f8

!!! Observation: These are cached addresses, so the game must be doing a cache invalidate at this location, maybe interesting to catch.

In the EUR version of Wipeout, the lhu instruction doing these reads is at 0888c16c (function starting at 0888C0A8), then there are some additional reads being done by 0881e0c0 (function starting at 0881E098, no idea what it's doing).

It's using lhu instructions (load 16-bit) and it looks to me like it's sampling a 4x4 rectangle around the sun's screen position from the depth buffer, skipping every other pixel - it is situated at 110000 in VRAM which starts at 04000000, plus the 600000 deswizzle offset that's needed to linearize the depth buffer in 8888 mode. A zero value it treats as sky, that is, sun is not occluded and it will draw the lense flare. As expected, as the sun slides across the image when the camera moves, these addresses, which are read from every frame, change accordingly. The game must be synchronized here since the depth buffer is not double-buffered.

For this to work correctly, we have to read back the depth buffer every frame to emulated PSP VRAM, which introduces a massive sync point between the GPU and CPU. This is not really desirable (although we should implement it as an option), so I've been thinking about ways to get around it:

  • Unsynchronized readbacks (Vulkan)
    • Schedule readbacks on the GPU's timeline, but don't wait for them to complete, instead have a background thread wait on a fence from the GPU, copying to the PSP's VRAM when done, whenever that is, in the background. For the purposes of lens flare visibility, this might be fine. No CPU stall, but the readback might be delayed, which could in theory (and given our luck, probably in practice too) corrupt important data during level transitions and similar.
  • Virtual readbacks with hooks
    • As above, but we don't actually read back to the PSP's VRAM but to an alternative buffer, then for each affected game, hook the PSP function that reads from the depth buffer to read the data from the alternative buffer instead. This has the advantage that we don't even need to copy the whole depth buffer from GPU memory to PSP RAM, instead we can read directly from there (need to be careful about memory types for performance), but requires game specific work.
  • Automatic hooks
    • Same as the last one, but we protect the memory that the depth buffer belongs to, and when we get a memory exception, we mark that code block as to be recompiled with a special memory check, that then goes and reads VRAM addresses from our special depth buffer. So the same but no manual work per-game.

Anyway, I think the first step will be to create the correct-ish but slow solution of doing hard-synced readbacks to PSP VRAM. The question is when exactly in the frame we should do these. "When finished rendering the main depth buffer" is presumably the best option, but there's no clear way to detect that. Maybe just do it when the main framebuffer is displayed, or something.

.... To be continued

hrydgard avatar Aug 29 '22 16:08 hrydgard

Artic Edge https://github.com/hrydgard/ppsspp/issues/11100#issuecomment-1123170383 NFS Shift https://github.com/hrydgard/ppsspp/issues/11100#issuecomment-1123377372

catthecreator avatar Aug 30 '22 13:08 catthecreator

Thanks, added to the list.

hrydgard avatar Aug 30 '22 13:08 hrydgard

Syphon Filter Logan's Shadow https://github.com/hrydgard/ppsspp/issues/10229#issuecomment-1232151181

QrTa avatar Aug 30 '22 21:08 QrTa

I don't know how aethersx2 PS2 emulator does it, but they get accurate full speed readback emulation using opengl on Android. I can play need for speed hot pursuit 2 without underclocking the emulator and have accurate readbacks turned on and maintain full speed emulation with no slowdowns and the sun hides when it's supposed to. And I think the PS2 has double the resolution of PSP. I do cheat a little though... I keep all my cores frequencies maxed out and GPU set at 3/4 speed on my rooted phone (SD 855+). The phone doesn't get too hot.... About 147° F on average. So I know ppsspp wouldn't be too demanding with accurate readbacks. I think what helps them is they have CPU affinity option that keeps the heaviest threads on the biggest cores of the phone.

71knight avatar Sep 02 '22 05:09 71knight

Aethersx2 is the pcsx2 mobile port btw.

ghost avatar Sep 02 '22 06:09 ghost

An alternative to readbacks for the games that peek the Z-buffer using the CPU, as commented elsewhere by @unknownbrackets , would be to run both the software and hardware renderers side by side, that way we'll always have accurate depth in CPU-accessible memory, at the right time.

This is expensive though, and to make it less so, it would be possible to have the software renderer only render depth buffers, and just ignore color - depth is a lot less complex so I think this would be way faster than running the full software renderer. This wouldn't work for cases where games reinterpret color and Z like Kuroyou, but I don't think that applies to any of these cases.

Also gonna have to look into what PCSX2 does. Maybe SX2 Aether does something special on top, hard to say given it's close source..

hrydgard avatar Sep 09 '22 22:09 hrydgard

I will say, the loop to interpolate triangle data is the slowest part of the software renderer now, I think. Texture sampling is still fairly slow as well.

We'd still need to texture (because of alpha tests/color tests), but we could skip alpha blend and logicops. Skipping blending would save time, but I don't think it'd make a huge difference overall.

Maybe we could have a "fast and loose" mode where it ignores color and alpha tests, though, or at least skip sampling/etc. when they're not enabled (which would be safe.) That would also allow us to skip lighting which is quite expensive.

-[Unknown]

unknownbrackets avatar Sep 10 '22 03:09 unknownbrackets

Yeah I think we can go very fast and loose for Z-only. Texturing only needs to be done when we know there's alpha. And we could skip filtering and mipmapping for example..

hrydgard avatar Sep 10 '22 07:09 hrydgard

Right. My biggest concern would be "depth boxes" from alpha testing. For example, if some far away trees or clouds were drawn to cover the sun, but without alpha testing they cover the entire thing. If we can safely skip alpha testing, it probably helps the potential speed a lot, because it cuts out many, many things.

We might end up in a place where we're using heuristics to skip alpha testing, though. For example, it's probably mainly an issue with flat Z - models probably don't need alpha testing for depth to be correct.

-[Unknown]

unknownbrackets avatar Sep 10 '22 14:09 unknownbrackets

Socom US Navy Seal: Tactical Strike is also affected. https://github.com/hrydgard/ppsspp/issues/15071 Screenshot_2022-10-13-02-23-08-77

UCUS98649.ppdmp.zip

ghost avatar Oct 12 '22 18:10 ghost

Thanks, added to list.

hrydgard avatar Oct 12 '22 18:10 hrydgard

Burnout Dominator sun flares is glitchy using the recently build PPSSPP. Screenshot_20230204_194908_2f85358b2198d26f8aca533d68bee793 ULUS10236.zip

ghost avatar Feb 04 '23 11:02 ghost

Yeah, I'll have to take a look at those again.

hrydgard avatar Feb 04 '23 12:02 hrydgard

Resistance Retribution

Software Screenshot_20230318_065453_2f85358b2198d26f8aca533d68bee793

Vulkan/OpenGL Screenshot_20230318_065718_2f85358b2198d26f8aca533d68bee793

GE Dump UCES01184.ppdmp.zip

Edit: fixes by [ReadbackDepth] compatibility but makes the game slower and make my opponent invisible :(

ghost avatar Mar 17 '23 23:03 ghost