binjs-ref
binjs-ref copied to clipboard
Test different sortings for `[STRINGS]`
We currently sort [STRINGS]
from most used to least used. This could interfere with compression. Let's try and see if we get better results by storing them:
- by lexicographical order;
- by lexicographical order right-to-left.
Brotli generally does well when it can make a copy from a short distance away. Sorting is a decent heuristic because it puts prefixes together. You might wring a bit more benefit out of sorting by bucketing strings into "short" and "long" and then sorting those.
Experiments indicate that we can gain ~2% on samples, by doing such changes, but changing sorting order that helps some samples hurts others. To be continued...