glsl-shaders icon indicating copy to clipboard operation
glsl-shaders copied to clipboard

Unoptimized gamma correction shader math in crt-pi

Open battaglia01 opened this issue 7 years ago • 4 comments

There's a few unoptimized lines of code in the gamma correction part of crt-pi.glsl, which is linked for reference here: https://github.com/libretro/glsl-shaders/blob/master/crt/shaders/crt-pi.glsl

Gamma correction has been noted to be a potential source of slowdown in the code, and also in this thread here. However, all of the math here is really unoptimized, which is likely what is causing the slowdown.

Gamma correction is done on line 190-208. For reference here:

#if defined(SCANLINES)
#if defined(GAMMA)
#if defined(FAKE_GAMMA)
		colour = colour * colour;
#else
		colour = pow(colour, vec3(INPUT_GAMMA));
#endif
#endif
		scanLineWeight *= BLOOM_FACTOR;
		colour *= scanLineWeight;

#if defined(GAMMA)
#if defined(FAKE_GAMMA)
		colour = sqrt(colour);
#else
		colour = pow(colour, vec3(1.0/OUTPUT_GAMMA));
#endif
#endif
#endif

If we assume SCANLINES, GAMMA and FAKE_GAMMA are all defined, the above reduces to the following:

		colour = colour * colour;
		scanLineWeight *= BLOOM_FACTOR;
		colour *= scanLineWeight;
		colour = sqrt(colour);

Is there a reason it's being done like this? All of that is equivalent to

		colour *= sqrt(scanLineWeight * BLOOM_FACTOR)

This saves one multiplication and three assignments per loop! We avoid the unnecessary squaring and subsequent square rooting of colour, and we also don't need to update scanLineWeight as it's never used again in this scope. we' I don't know how much the assignments matter or if they're optimized out anyway, but fighting with the emulator over memory accesses has been noted as one of the major causes of slowdown, so worth bringing up...

There's a similar (but slightly trickier) thing you can do with the true gamma correction, not just FAKE_GAMMA, but I'll start here for now to see if I'm on the right wavelength...

battaglia01 avatar Oct 08 '17 03:10 battaglia01

Yeah, probably just done that way for code clarity. It'd be worth looking at the assembly to see how much of a difference it makes.

hizzlekizzle avatar Oct 09 '17 16:10 hizzlekizzle

I'd be really surprised if any compiler knew to optimize a squaring and subsequent square root into one operation. The assignments, probably.

How can I compile this to assembly and check the output? Does OpenGL have an app for that, or do I just do something with GCC? Not used to GL shaders.

On Mon, Oct 9, 2017 at 12:03 PM hizzlekizzle [email protected] wrote:

Yeah, probably just done that way for code clarity. It'd be worth looking at the assembly to see how much of a difference it makes.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libretro/glsl-shaders/issues/35#issuecomment-335202921, or mute the thread https://github.com/notifications/unsubscribe-auth/AA-SsuzmUAvY7hzypNjR2-2k9woYy6Oqks5sqkO7gaJpZM4Pxk3D .

-- Mike

battaglia01 avatar Oct 10 '17 16:10 battaglia01

That's a good question. I've used fxc.exe for HLSL shaders, but there doesn't seem to be anything as universally easy to use for GLSL, which probably shouldn't surprise me...

However, it seems this Radeon GPU Analyzer from AMD may be able to do it: https://github.com/GPUOpen-Tools/RGA/releases

hizzlekizzle avatar Oct 10 '17 19:10 hizzlekizzle

It gains about 15-20 fps this way in my test. 668 after, 650 before, thats 2-3% difference.

metallic77 avatar May 18 '23 06:05 metallic77