hyperglot icon indicating copy to clipboard operation
hyperglot copied to clipboard

Arg to define glyphset without checking an existing font

Open RosaWagner opened this issue 3 years ago • 5 comments

It would be nice to create/define glyph sets from number of speakers or orthography status. For example; listing glyphs necessary to support languages > 20000 speakers

RosaWagner avatar Jun 24 '22 16:06 RosaWagner

For the time being I'd prefer to keep the CLI for working on font files so as to not muddy its purpose. For charset building you could use the library in a script, for example:

from hyperglot.languages import Languages
from hyperglot.language import Language, Orthography

glyphs = []
for iso, info in Languages().items():
    lang = Language(info, iso)

    if "speakers" not in lang or lang["speakers"] < 20000: 
        continue

    orth = lang.get_orthography()
    if not orth:
        continue

    orth = Orthography(lang.get_orthography())
    glyphs.extend(orth.base_chars)
    glyphs.extend(orth.required_base_marks)

sorted(set(glyphs))

You may look at the constructor options for Languages (particularly validity level) and the Orthography attributes you are interested in (e.g. ignore particular scripts entirely). I know there is a couple of quirks with these objects, but a lot of this is done for augmenting, validating and decomposing the yaml data into actual codepoints. For example, no speaker count in the data is information as such, as is no orthography being given.

For use of the library we should add more documentation in form of concrete "How do I..." examples like this 👍

kontur avatar Jun 27 '22 08:06 kontur

Similar: given a language (/ code), print out characters in language. Assumes terminal is using font with appropriate glyphs.

Else can just browse the hyperglot.yaml file :-)

Thanks, Ian

iandoug avatar Apr 19 '23 11:04 iandoug

Similar: given a language (/ code), print out characters in language. Assumes terminal is using font with appropriate glyphs.

One of the last updates introduced a feature like this. It's split from the main hyperglot command, use for example hyperglot-data eng or hyperglot-data Suomi — it will show info by iso code or attempt to find the language by name.

kontur avatar Apr 19 '23 18:04 kontur