lingua-go
lingua-go copied to clipboard
Compile-time language inclusion
If generated data for languages will be split between per-language, it's possible to strip down bundled language immensely.
Tag handling
Each file can contain //go:build
directive that controls inclusion of languages
By default, all generated files can include //go:build !lingua_ignore
which means "unless built with -tags lingua_ignore
, include this file". That is the same behaviour as it is now.
Then, build constraint //go:build (!lingua_ignore && !lingua_no<language>) || lingua_<language>
will be built when either tags -lingua_<language>
is specified or -tags lingua_no<language>
is NOT specified.
Thus, if you want all languages to be included, you simply do nothing and when you want to reduce language set to the minimum, you use build tags like -tags lingua_ignore,lingua_en,lingua_es,etc
.
If you want to exclude only several languages, you add -tags lingua_noge
without adding lingua_ignore
.
Model loading
For now, models are loaded from a single point in detector.go
through embed.FS
.
Instead of that, each language-model/<language>
could contain .go
file that has aforementioned build constraints.
This file can also load all *.zip
files into separate embed.FS
entity which can be then passed to the "main" filesystem in language-model
package.
language-model
package then can implement interface for fs.SubFS.
It could be as simple as generated file that has switch/case for all available languages that includes all language-model/*
packages.
Or, if you don't want to use generation, it should be simple enough to add Register
method that init
function of language-model/<language>/
package can then call. It won't be called if language package is ignored.