getlang
getlang copied to clipboard
czech language support, introduced dynamic weight for unicode blocks
Hello, I've added support for a czech language as well as support for dynamic weight for scripts (scriptCountFactor
).
The reason for that is that in a czech alphabet, there are few characters like ř, š, ů
which you won't find in any other language.
I hope you'll like my take on that.
Thank you for your contribution here. But I think it would fit in better with the pre-existing design to add a Czech line in the 'profiles.go' file. At least then we wouldn't have to hard-code unicode values like that.
The hardcoding of unicode values is optional. Predefined values from the unicode package are supported. I don't think your approach with profiles works for all languages. Czech is a part of Slavic languages family which are very similar to each other when comparing words syllables. I wasn't able to make the tool work while there is Serbian (Slavic language) already present. I'll be happy to see it's possible, but it feels unlikely.
I came with a solution that is based on a fact that characters like ř, š, ů
are unique for the Czech language.