uucp
uucp copied to clipboard
Unicode character properties for OCaml
[UTS 39](https://www.unicode.org/reports/tr39/) specifies a list of "[confusables](https://www.unicode.org/Public/security/revision-05/confusables.txt)" as well as "[intentional confusables](https://www.unicode.org/Public/security/revision-05/intentional.txt)". These are characters like the greek and cyrillic characters which look identical but are not normalized to each...
From the standard (5.18): > Most characters in the standard have identical values their `Titlecase_mapping` and `Uppercase_mapping`; however the two values are distinguished for these few digraph compatibility characters.
We could try encode some of the binary trees as linear arrays. This could lower both lookup time and memory consumption.
I think you may be missing parenthesis in the [definition of tmapbool](https://github.com/dbuenzli/uucp/blob/master/src/uucp_tmapbool.ml#L20) It may cause your innermost tables to be [larger than necessary and filled with unnecessary default values](https://github.com/dbuenzli/uucp/blob/master/src/uucp_alpha_data.ml).