more speedups for fjxl decoding
Same as https://github.com/libjxl/libjxl/pull/1149 but now without the change to ClampedGradient, which wasn't helping much anyway and it apparently wasn't safe when the numbers use the full int32_t range, like in the lossless pfm conformance test case.
Can you add speed numbers with the new version?
Hm, strange, seems like compared to current git main branch, this doesn't really give much of a speedup at all anymore:
Before: 3072 x 2048, geomean: 39.96 MP/s [29.81, 41.69], 30 reps, 0 threads. 3072 x 2048, geomean: 68.28 MP/s [45.65, 72.18], 30 reps, 2 threads. 3072 x 2048, geomean: 102.01 MP/s [58.48, 109.59], 30 reps, 4 threads.
After: 3072 x 2048, geomean: 40.17 MP/s [27.99, 41.99], 30 reps, 0 threads. 3072 x 2048, geomean: 69.07 MP/s [48.04, 73.16], 30 reps, 2 threads. 3072 x 2048, geomean: 102.10 MP/s [64.97, 109.45], 30 reps, 4 threads.
Hm, strange, seems like compared to current git main branch, this doesn't really give much of a speedup at all anymore:
Before: 3072 x 2048, geomean: 39.96 MP/s [29.81, 41.69], 30 reps, 0 threads. 3072 x 2048, geomean: 68.28 MP/s [45.65, 72.18], 30 reps, 2 threads. 3072 x 2048, geomean: 102.01 MP/s [58.48, 109.59], 30 reps, 4 threads.
After: 3072 x 2048, geomean: 40.17 MP/s [27.99, 41.99], 30 reps, 0 threads. 3072 x 2048, geomean: 69.07 MP/s [48.04, 73.16], 30 reps, 2 threads. 3072 x 2048, geomean: 102.10 MP/s [64.97, 109.45], 30 reps, 4 threads.
I'd consider not doing it then :)
Hm, strange, seems like compared to current git main branch, this doesn't really give much of a speedup at all anymore: Before: 3072 x 2048, geomean: 39.96 MP/s [29.81, 41.69], 30 reps, 0 threads. 3072 x 2048, geomean: 68.28 MP/s [45.65, 72.18], 30 reps, 2 threads. 3072 x 2048, geomean: 102.01 MP/s [58.48, 109.59], 30 reps, 4 threads. After: 3072 x 2048, geomean: 40.17 MP/s [27.99, 41.99], 30 reps, 0 threads. 3072 x 2048, geomean: 69.07 MP/s [48.04, 73.16], 30 reps, 2 threads. 3072 x 2048, geomean: 102.10 MP/s [64.97, 109.45], 30 reps, 4 threads.
I'd consider not doing it then :)
Agreed, doesn't make much sense to merge this if there is no real speedup.
I wonder why I was seeing more substantial speed improvements before though (see the numbers in the previous PR), so maybe leave this PR open for a while to remind me to investigate what happened there.