j40
j40 copied to clipboard
Performance on .fjxl
Test image: https://stsci-opo.org/STScI-01GA76Q01D09HFEV174SVMQDMV.png
$ time ./dj40 w.fjxl
14560x8418 frame read and discarded.
real 0m16,854s
user 0m16,325s
sys 0m0,549s
8 MPx/s, PNG decoding speed on the same CPU is 50 MPx/s.
Too slow for just a prefix_codes + simple_avg_predictor + color_conversion
.
Also, the width is incorrect (must be 14557):
$ time ./djxl w.fjxl --num_threads=1
JPEG XL decoder v0.7.0 3a4676f [AVX2,SSE4,SSSE3,Emu128]
Read 145997782 compressed bytes.
No output file specified.
Decoding will be performed, but the result will be discarded.
Decoded to pixels.
14557 x 8418, 16.68 MP/s [16.68, 16.68], 1 reps, 1 threads.
Allocations: 489484 (max bytes in use: 4.114270E+09)
real 0m7,588s
user 0m6,990s
sys 0m0,600s
The performance issue is currently well known and there is a huge room for improvements. That said, yeah, specifically improving fjxl performance might be a good way to start that effort. (I think libjxl specializes MA tree decoding for fjxl so that may have made a huge difference.)
The incorrect size is due to the peculiarity of fjxl encoding; it always rounds width up to the next multiple of 8 or 16 (I can't recall), and relies on the crop rectangle to hide an excess bit. J40 currently doesn't implement crop rectangles, which is also documented.