hyphenation icon indicating copy to clipboard operation
hyphenation copied to clipboard

Embedded all language in lib make it too fat

Open blackgear opened this issue 7 years ago • 3 comments

Although we can edit build.rs manually, change the

let langs = vec![
    "af",
    "hy",
...
    "hsb",
    "cy"
];

to

let langs = vec!["en"];

to make the final lib much smaller (8.65 MB -> 132 KB). I think its nice to have a option in [dependencies.hyphenation] to set what language to be embedded.

Maybe something like this:

[dependencies.hyphenation]
version = "0.6.0"
features = ["nfd"]
language = ["en-us"]

blackgear avatar Jan 17 '18 05:01 blackgear

Dictionary embedding was already going to be under a feature flag starting with the next release, and adding individual language flags is certainly an idea worth considering. (It would have to be flags, because the Cargo manifest format and Rust cfg system are not flexible enough to allow as nice a syntax as language = ["en_us"] for library features.) It will probably happen soon, but not immediately..

tapeinosyne avatar Feb 08 '18 16:02 tapeinosyne

maybe https://crates.io/crates/inflate and https://crates.io/crates/deflate also helps.

use deflate::deflate_bytes;

let data = b"Some data";
let compressed = deflate_bytes(data);

compress US-en lang 132kb to 20kb……

blackgear avatar Mar 03 '18 12:03 blackgear

Starting with v0.8, embedding all dictionaries should take no more than 2.8MB. Moreover, the feature embed_en-us has been introduced for the common case of embedding American English in e.g. a small utility.

I would still like to find a better solution; ideally, one which allows end-users to select languages individually without a feature explosion.

tapeinosyne avatar May 19 '20 22:05 tapeinosyne