chhapkaam
chhapkaam copied to clipboard
Generation of all possible font glyphs
@kartikm, this scripts includes all the artificial conjuncts (i.e. one not occurring in in Gujarati corpus or not seen so far) as well, right ?
@samyakbhuta Mostly. We should modify script to generate acceptable glyphs in separate output file. I'll do that in coming days when dust is settled down.
Few questions (affects OCR in a big way as well) ,
- Do you know which are the acceptable glyphs ? You have some linguistic inputs for this data if any ? any empirical way you have arrived at such list (e.g. like scanning a corpus ) ?
- I also found it little challenging as how we gonna keep this list up to date ? e.g. Since now ટ્વીટર is accepted Gujarati word, a glyph
has to be considered accepted.
- From where and how do we identify such new glyphs ? I guess for now volunteer reporting and case by case basis judgement and inclusion would be a good start. In any case we need to maintain the list.
- Once the glyphs are listed few basic stats are also needed. I am opening a separate issue for that #19
- There is no standard list. Some refer Sarth's dictionary as standard, while in some old texts, some conjuncts are very weird to have in modern fonts. As far as I know, Kalapi is near to standard.
- Yes, it is challenging. Twitter's 'Tw' is already there is Lohit. See here:
- Needed!
In that case we need to scan the corpus and arrive at all the possible conjuncts that are available in Gujarati. Good chhe.