morphodict
morphodict copied to clipboard
remove code to extract parentheticals from definition processing
Email from @aarppe:
While I was just looking at the phrase translation code here:
https://github.com/UAlbertaALTLab/morphodict/tree/main/src/CreeDictionary/phrase_translate
I recalled this one script here:
https://github.com/UAlbertaALTLab/morphodict/blob/main/src/CreeDictionary/phrase_translate/definition_processing.py
... which was implemented by Andrew as a temporary fix to remove parenthesized segments from being passed for phrase translation. With the current partitioning of the definitions, in particular with the resultant coreDefinition being the element that is passed on to phrase translation, this code snippet should no longer be needed (though it might be needed for some other language), so I'm explicitly noting this so that it hasn't gotten accidentally overlooked.
https://github.com/UAlbertaALTLab/morphodict/blob/main/src/CreeDictionary/phrase_translate/definition_processing.py
I can remove this code, but what's the desired outcome? How do I know the task is accomplished?
Basically, nothing should happen since the relevant changes are already implemented on the DB side.
Anyhow, you could find some entry in CW that has such a parenthesizes expression, and for that entry the phrase inflection should exclude the parenthesized phrase, both currently and after you remove the code.
Is this the expected behaviour?

This is what happens in production at the moment and when I remove parentheticals locally.
This concerns the case when the English definition has something in parentheses that we do not want to pass onto the phrase inflection, cf.

In the above it works (in that the parenthetical '(i.e. a bee)' is not part of the espt output, though we ought to do something about the extra space), but the intention was that this would be achieved by specifying in the crk dictionary DB the special variant of the definition that would be passed onto phrase translation - so that this need not be done by the code.
We'd have to consult whether this is indeed done, e.g. for the above case, in the DB, and whether one makes use of whatever the name of that specific field is that is intended for phrase translation.
And for the 'mîcisow' case, the current implementation does seem to work, but we need to check whether it's achieved on the coding or DB side, cf.
