morphodict
morphodict copied to clipboard
Update English phrase FOMA FSTs
I've recompiled the English phrase analysis and generation FOMA FSTs, cf.
-rw-r--r-- 1 arppe staff 682821 20 Mar 18:32 src/transcriptions/transcriptor-cw-eng-noun-entry2inflected-phrase-w-flags.fomabin
-rw-r--r-- 1 arppe staff 599759 20 Mar 19:14 src/transcriptions/transcriptor-cw-eng-verb-entry2inflected-phrase-w-flags-and-templates.fomabin
-rw-r--r-- 1 arppe staff 613779 20 Mar 18:20 src/transcriptions/transcriptor-eng-phrase2crk-features.fomabin
... and am placing these in the designated subdirectory in our repo, in: ./morphodict/src/CreeDictionary/res/fst/
If pushing these to the repo won't work, these FOMA FSTs can be compiled with foma -l with the associates *.xfscript files in ./lang-crk/src/transcriptions/.
I also uploaded these to our subrepo intended for large FSTs: https://github.com/UAlbertaALTLab/fst-exchange
@nienna73 We might want to upload these, as using them should fix certain glitches in the English phrase translations with the original versions of the FOMABINs, presumably following the instructions here: https://github.com/UAlbertaALTLab/morphodict/tree/main/src/CreeDictionary/phrase_translate.
This connects to work needed for #1166
@aarppe clarification needed: The following are the FOMA FSTs actually in use in the code (link to line where they appear:)
- transcriptor-cw-eng-noun-entry2inflected-phrase-w-flags.fomabin
- transcriptor-cw-eng-verb-entry2inflected-phrase-w-flags.fomabin
- transcriptor-eng-phrase2crk-features.fomabin
If there is an intention to instead run the phrase-w-flags-and-templates FSTs instead, let me know.
@fbanados The generator for English verb phrases used a new approach (templates), and was renamed accordingly (so the code reference should also be updated); the approaches for English noun phrase generation and general English phrase analysis didn't change, nor did the names, but there may have been modifications to the FOMABIN files. I thought I had updated and uploaded those three files, described above. Anyhow, the code mismatch might well explain why the English phrase FSTs are not fully working as intended.