chhapkaam icon indicating copy to clipboard operation
chhapkaam copied to clipboard

Generation of all possible font glyphs

Open samyakbhuta opened this issue 11 years ago • 5 comments

samyakbhuta avatar Apr 16 '13 18:04 samyakbhuta

@kartikm, this scripts includes all the artificial conjuncts (i.e. one not occurring in in Gujarati corpus or not seen so far) as well, right ?

samyakbhuta avatar Apr 17 '13 06:04 samyakbhuta

@samyakbhuta Mostly. We should modify script to generate acceptable glyphs in separate output file. I'll do that in coming days when dust is settled down.

kartikm avatar Apr 18 '13 02:04 kartikm

Few questions (affects OCR in a big way as well) ,

  • Do you know which are the acceptable glyphs ? You have some linguistic inputs for this data if any ? any empirical way you have arrived at such list (e.g. like scanning a corpus ) ?
  • I also found it little challenging as how we gonna keep this list up to date ? e.g. Since now ટ્વીટર is accepted Gujarati word, a glyph twi has to be considered accepted.
    • From where and how do we identify such new glyphs ? I guess for now volunteer reporting and case by case basis judgement and inclusion would be a good start. In any case we need to maintain the list.
  • Once the glyphs are listed few basic stats are also needed. I am opening a separate issue for that #19

samyakbhuta avatar Apr 19 '13 04:04 samyakbhuta

  1. There is no standard list. Some refer Sarth's dictionary as standard, while in some old texts, some conjuncts are very weird to have in modern fonts. As far as I know, Kalapi is near to standard.
  2. Yes, it is challenging. Twitter's 'Tw' is already there is Lohit. See here: tw
  3. Needed!

kartikm avatar Apr 20 '13 06:04 kartikm

In that case we need to scan the corpus and arrive at all the possible conjuncts that are available in Gujarati. Good chhe.

samyakbhuta avatar Apr 20 '13 07:04 samyakbhuta