lingua Reduce resources to load language models

Reduce resources to load language models

Open pemistahl opened this issue 1 year ago • 0 comments

Currently, the language models are parsed from json files and loaded into simple maps at runtime. Even though accessing the maps is pretty fast, they consume a significant amount of memory. The goal is to investigate whether there are more suitable data structures available that require less storage space in memory, something like NumPy for Python. Perhaps it is even possible to store those data structures in some kind of binary format on disk which can be loaded faster than the current json files.

Promising candidates could be:

EJML
Colt
la4j
Apache Commons Math

Nov 05 '22 09:11 pemistahl

lingua lingua copied to clipboard

Reduce resources to load language models

lingua
lingua copied to clipboard