vite-plugin-optimize-css-modules icon indicating copy to clipboard operation
vite-plugin-optimize-css-modules copied to clipboard

A different dictionary can trivially improve g-zip compress ratio

Open qeleb opened this issue 11 months ago • 6 comments

Currently, the default dictionary is: _-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

Based on a character frequency analysis of the code in a few of my projects it should be re-ordered to this: etionraldfps0gx-1chbum4v6w25k9y873zjHCONADLYqBEFGIJKMPQRSTUVWXZ_

For me this measurably improved g-zip compression ratio without any cost. Perhaps a deeper analysis could be done for the most optimal default dictionary, but this is at least a step in the right direction.

qeleb avatar Dec 26 '24 22:12 qeleb

Wow, this is super interesting! I definitely haven't thought about that but will for sure have a look at it!

simonwep avatar Dec 30 '24 15:12 simonwep

I added benchmarks and ran them against your dictionary, of course this his highly skewed because of me only testing against two libraries at the moment. I could only notice a small improvement in size but a major slowdown in bundle time:

Current dictionary

Input Build Time Gzip Size Brotli Size
bootstrap-5.0.2.module.css 525ms (-94.06% / -8311ms) 21.3 kB (-26.53% / -7.69 kB) 21.3 kB (-27.54% / -6 kB)
materialize-1.0.0.module.css 572ms (-92.59% / -7156ms) 20.1 kB (-19.70% / -4.93 kB) 20.1 kB (-21.33% / -4.3 kB)

Your dictionary

Input Build Time Gzip Size Brotli Size
bootstrap-5.0.2.module.css 1106ms (-88.00% / -8114ms) 21.3 kB (-26.66% / -7.73 kB) 21.3 kB (-28.22% / -6.15 kB)
materialize-1.0.0.module.css 1112ms (-87.46% / -7751ms) 20.1 kB (-19.80% / -4.95 kB) 20.1 kB (-20.64% / -4.16 kB)

Do you have any public accessible libraries/css code that I can use to test this against or references on why the order of your dictionary should have a large impact? I can imagine you ordered it based on the frequency of how often each character is used but I'd like to include some references before I make that change :D

simonwep avatar Dec 31 '24 14:12 simonwep

I ran a character frequency analysis in my particular project’s css.. I suspect if you do that for these specific projects you’ll observe some real improvement. When I get a chance, I can look into a better general dictionary.

I’m also not really sure why build time would be affected.. that’s strange to me

qeleb avatar Dec 31 '24 15:12 qeleb

I just ran it a few more times, seems like my mac was occupied with something else - you're right, the build time is the same.

Input Build Time Gzip Size Brotli Size
bootstrap-5.0.2.module.css 550ms (-93.93% / -8515ms) 21.3 kB (-26.66% / -7.73 kB) 21.3 kB (-28.22% / -6.15 kB)
materialize-1.0.0.module.css 569ms (-93.29% / -7917ms) 20.1 kB (-19.80% / -4.95 kB) 20.1 kB (-20.64% / -4.16 kB)

I also thought about character frequency, but this would've been needed based on the the CSS that is compiled against which should be possible in the scope of the plugin?! Not sure if it's worth it though or if a general, improved dictionary based on the most frequently used characters in css attributes is more useful.

simonwep avatar Jan 02 '25 07:01 simonwep

this is the same question I’m asking. If it’s done in the plugin, we’d have to iterate over the css twice (or perhaps cache some info like character frequency run to run). this would definitely bring the best results, but I think having a better general purpose dictionary may yield results nearly as good and for much easier— I think at minimum we should take that approach first.

qeleb avatar Jan 02 '25 16:01 qeleb

Yeah I'd also favor a more general dictionary, if you want you can have a go at this since you already investigated this for your own personal project and open an PR in case you find anything :)

simonwep avatar Jan 03 '25 13:01 simonwep