morphodict icon indicating copy to clipboard operation
morphodict copied to clipboard

Ensure "FST output style" is consistent, even if the FST has no output

Open eddieantonio opened this issue 3 years ago • 4 comments

There's a few things here:

  • the database importer seems to want to create a valid analysis string for every entry
  • entries to be imported may not be analyzable by the FST (e.g., pê-)
  • the importer synthesizes an analysis from its declared wordclass

In the default case, the "FST output style" Title Cases all tags that are not noun or verb word classes:

https://github.com/UAlbertaALTLab/cree-intelligent-dictionary/blob/5b7ffa5f9ac1c649d2658e6b05b93714862d7a77/src/CreeDictionary/utils/enums.py#L102-L103

Is this... a good assumption? Should we change this assumption? Should we be synthesizing an analysis at all? Will the WordClass enum be scrapped in @andrewdotn's language generalization port?

Related: #814 — +Ipv was being generated here, although crkeng.xml has it as "IPV". This resulted in a failing test case.

eddieantonio avatar May 25 '21 20:05 eddieantonio