lingua-go icon indicating copy to clipboard operation
lingua-go copied to clipboard

Compile-time language inclusion

Open gudvinr opened this issue 2 months ago • 0 comments

If generated data for languages will be split between per-language, it's possible to strip down bundled language immensely.

Tag handling

Each file can contain //go:build directive that controls inclusion of languages

By default, all generated files can include //go:build !lingua_ignore which means "unless built with -tags lingua_ignore, include this file". That is the same behaviour as it is now.

Then, build constraint //go:build (!lingua_ignore && !lingua_no<language>) || lingua_<language> will be built when either tags -lingua_<language> is specified or -tags lingua_no<language> is NOT specified.

Thus, if you want all languages to be included, you simply do nothing and when you want to reduce language set to the minimum, you use build tags like -tags lingua_ignore,lingua_en,lingua_es,etc.

If you want to exclude only several languages, you add -tags lingua_noge without adding lingua_ignore.

Model loading

For now, models are loaded from a single point in detector.go through embed.FS.
Instead of that, each language-model/<language> could contain .go file that has aforementioned build constraints.

This file can also load all *.zip files into separate embed.FS entity which can be then passed to the "main" filesystem in language-model package.

language-model package then can implement interface for fs.SubFS.
It could be as simple as generated file that has switch/case for all available languages that includes all language-model/* packages.
Or, if you don't want to use generation, it should be simple enough to add Register method that init function of language-model/<language>/ package can then call. It won't be called if language package is ignored.

gudvinr avatar Apr 07 '24 10:04 gudvinr