libjxl tweaks to modular effort tradeoffs

Also related to https://github.com/libjxl/libjxl/pull/4232 and https://github.com/libjxl/libjxl/pull/4154

Chipping away at the Pareto front, these tweaks aim to (slightly) improve the effort/density trade-offs.

Changes:

After the bugfix at https://github.com/libjxl/libjxl/pull/4154 that caused max_property_values to actually get respected, we can bump up the number of property value quantization buckets at all effort settings (which improves density at the cost of some speed, though overall this has little speed impact)
The nb_repeats parameter (which can be configured via the API but by default is just 0.5 at all efforts) is now modulated by effort too, i.e. lower effort also uses fewer samples for MA tree learning. This speeds up lower efforts and slows down higher efforts, remaining neutral at default effort.
Simplified/improved the tree learning heuristics a little since the logic was a bit wonky: adds_wp could be false even though the candidate split does use the WP (when the parent node already used the WP), which can lead to selecting a suboptimal split because of the fast_decode_multiplier preferring a slightly worse split with adds_wp == false to a better split with adds_wp == true (which doesn't make sense if in both options, the WP is used anyway). The simpler logic is slightly faster and denser (the difference is small though).

Before: (jyrki31 corpus)

31 images
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0:4        13270 17162582   10.3463459   5.225  49.279          nan 100.00000000  99.99   0.00000000  0.000000000000  10.346      0
jxl:d0:5        13270 16971925   10.2314097   2.893  39.969          nan 100.00000000  99.99   0.00000000  0.000000000000  10.231      0
jxl:d0:6        13270 16860935   10.1645001   1.849  35.470          nan 100.00000000  99.99   0.00000000  0.000000000000  10.165      0
jxl:d0:7        13270 16638016   10.0301149   1.188  31.430          nan 100.00000000  99.99   0.00000000  0.000000000000  10.030      0
jxl:d0:8        13270 16534367    9.9676308   0.319  31.807          nan 100.00000000  99.99   0.00000000  0.000000000000   9.968      0
jxl:d0:9        13270 16458308    9.9217791   0.235  29.942          nan 100.00000000  99.99   0.00000000  0.000000000000   9.922      0
Aggregate:      13270 16769172   10.1091812   1.164  35.760   0.00000000 100.00000000  99.99   0.00000000  0.000000000000  10.109      0

After:

31 images
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0:4        13270 17117248   10.3190166   6.899  44.328          nan 100.00000000  99.99   0.00000000  0.000000000000  10.319      0
jxl:d0:5        13270 16934902   10.2090906   3.290  39.130          nan 100.00000000  99.99   0.00000000  0.000000000000  10.209      0
jxl:d0:6        13270 16856572   10.1618699   2.078  35.075          nan 100.00000000  99.99   0.00000000  0.000000000000  10.162      0
jxl:d0:7        13270 16635589   10.0286518   1.167  32.788          nan 100.00000000  99.99   0.00000000  0.000000000000  10.029      0
jxl:d0:8        13270 16532724    9.9666403   0.305  32.073          nan 100.00000000  99.99   0.00000000  0.000000000000   9.967      0
jxl:d0:9        13270 16444861    9.9136727   0.215  30.934          nan 100.00000000  99.99   0.00000000  0.000000000000   9.914      0
Aggregate:      13270 16751992   10.0988244   1.238  35.433   0.00000000 100.00000000  99.99   0.00000000  0.000000000000  10.099      0

TL;DR: e4-e6 become faster and slightly denser (so just better), e7 stays about the same (a tiny bit denser and slower, maybe), e8+ become slightly denser and slower.

May 07 '25 09:05 jonsneyers

This reminds me, I was going to try re-enabling P15 at effort 9. It was previously disabled because e9 was slower than e10, but that only applies to images under 2048 x 2048 where Local MA trees (and effectively multithreading) is disabled.

Instead of that though, we might explore a wider predictor overhaul. Adding new options like P14, that try a subset of the most commonly used predictors. Possibly even replace P14, due to how slow Weighted is for en/decoding with marginal improvement over Gradient in most cases, but that will need to be tested and discussed.

May 09 '25 13:05 jonnyawsom3

I did some testing recently, and I think -E 1 could be enabled at effort 9, with faster decoding level 2 defaulting it back to 0. It has a small en/decode speed penalty, but the density improvement can be better than -P 15, which is enabled at effort 10.

It should match well to the higher MA percent in this PR, and uses another feature which is disabled by default.

Aug 13 '25 23:08 jonnyawsom3

I did some testing recently, and I think -E 1 could be enabled at effort 9, with faster decoding level 2 defaulting it back to 0. It has a small en/decode speed penalty, but the density improvement can be better than -P 15, which is enabled at effort 10.

It should match well to the higher MA percent in this PR, and uses another feature which is disabled by default.

That could make sense, yes. Let's do it in another PR though.

Oct 17 '25 14:10 jonsneyers

Rebased this.

Now the performance impact is as follows:

Before:

31 images
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0:4        13270 17162620   10.3463688   5.390  58.479          nan 100.00000000  99.99   0.00000000  0.000000000000  10.346      0
jxl:d0:5        13270 16908996   10.1934733   3.187  46.673          nan 100.00000000  99.99   0.00000000  0.000000000000  10.193      0
jxl:d0:6        13270 16797889   10.1264932   1.897  40.528          nan 100.00000000  99.99   0.00000000  0.000000000000  10.126      0
jxl:d0:7        13270 16625029   10.0222858   1.181  34.947          nan 100.00000000  99.99   0.00000000  0.000000000000  10.022      0
jxl:d0:8        13270 16478362    9.9338686   0.380  35.581          nan 100.00000000  99.99   0.00000000  0.000000000000   9.934      0
jxl:d0:9        13270 16385839    9.8780917   0.263  33.514          nan 100.00000000  99.99   0.00000000  0.000000000000   9.878      0
Aggregate:      13270 16724387   10.0821830   1.251  40.795   0.00000000 100.00000000  99.99   0.00000000  0.000000000000  10.082      0

After:

31 images
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0:4        13270 17117175   10.3189726   7.764  54.181          nan 100.00000000  99.99   0.00000000  0.000000000000  10.319      0
jxl:d0:5        13270 16872864   10.1716914   3.956  46.188          nan 100.00000000  99.99   0.00000000  0.000000000000  10.172      0
jxl:d0:6        13270 16793526   10.1238630   2.337  40.800          nan 100.00000000  99.99   0.00000000  0.000000000000  10.124      0
jxl:d0:7        13270 16622723   10.0208956   1.249  36.068          nan 100.00000000  99.99   0.00000000  0.000000000000  10.021      0
jxl:d0:8        13270 16474269    9.9314011   0.380  35.807          nan 100.00000000  99.99   0.00000000  0.000000000000   9.931      0
jxl:d0:9        13270 16371339    9.8693505   0.249  34.870          nan 100.00000000  99.99   0.00000000  0.000000000000   9.869      0
Aggregate:      13270 16706772   10.0715641   1.429  40.778   0.00000000 100.00000000  99.99   0.00000000  0.000000000000  10.072      0

The 'before' is now better than the 'after' was before (other improvements have been made in the mean time), but it looks like this is still an improvement, Pareto-wise. At every effort setting, compression slightly improves, and encode speed either improves or remains similar.

Oct 17 '25 14:10 jonsneyers

Is there any intent/reason behind kSquirrel not having a value defined for nb_repeats or was this just a simple oversight/mistake?

Oct 20 '25 20:10 goodusername123

nb_repeats is capped at 1 in this PR, so I'm not sure why it also has 1.1 set for Kitten and 1.3 for Glacier. I know higher values should increase the quantization percent, but then cparams_.options.nb_repeats = std::min(1.0f, cparams_.options.nb_repeats); should cap at 10, not 1. https://github.com/libjxl/libjxl/blob/7cac2ac860e41f7f4199b73508490016a8af204c/lib/jxl/modular/encoding/enc_ma.cc#L979

Oct 20 '25 23:10 jonnyawsom3