source emoji from unicode.org html file; add emoji category
This PR makes the code to generate emoji.ml parse html files from unicode.org: http://www.unicode.org/emoji/charts/full-emoji-list.html and https://www.unicode.org/emoji/charts/full-emoji-modifiers.html It has all emojis according to https://www.unicode.org/emoji/charts/emoji-counts.html It add diacritics fixes. It add (sub)categories! It add tests, and various changes, update readme.md
It conflict with the lasts two commits since those changes are based on before @favonia change to gencode.ml but the only real conflict is how diacritics are handled. In this PR we only use '_' to replace diacritics to stay consistent and stay as close as possible to the official names.
For diacritics handling, you may want to use sanette/ubase. It may not fix everything (I see you had to replace 1st to have an OCaml valid identifier) but it should help in some cases.
This is now updated to unicode v15.1 🐦🔥