morphodict icon indicating copy to clipboard operation
morphodict copied to clipboard

remove code to extract parentheticals from definition processing

Open dwhieb opened this issue 4 years ago • 5 comments

Email from @aarppe:

While I was just looking at the phrase translation code here:

https://github.com/UAlbertaALTLab/morphodict/tree/main/src/CreeDictionary/phrase_translate

I recalled this one script here:

https://github.com/UAlbertaALTLab/morphodict/blob/main/src/CreeDictionary/phrase_translate/definition_processing.py

... which was implemented by Andrew as a temporary fix to remove parenthesized segments from being passed for phrase translation. With the current partitioning of the definitions, in particular with the resultant coreDefinition being the element that is passed on to phrase translation, this code snippet should no longer be needed (though it might be needed for some other language), so I'm explicitly noting this so that it hasn't gotten accidentally overlooked.

https://github.com/UAlbertaALTLab/morphodict/blob/main/src/CreeDictionary/phrase_translate/definition_processing.py

dwhieb avatar Nov 26 '21 22:11 dwhieb

I can remove this code, but what's the desired outcome? How do I know the task is accomplished?

nienna73 avatar Apr 12 '22 20:04 nienna73

Basically, nothing should happen since the relevant changes are already implemented on the DB side.

Anyhow, you could find some entry in CW that has such a parenthesizes expression, and for that entry the phrase inflection should exclude the parenthesized phrase, both currently and after you remove the code.

aarppe avatar Apr 16 '22 15:04 aarppe

Is this the expected behaviour? Screen Shot 2022-04-16 at 5 36 51 PM

This is what happens in production at the moment and when I remove parentheticals locally.

nienna73 avatar Apr 19 '22 18:04 nienna73

This concerns the case when the English definition has something in parentheses that we do not want to pass onto the phrase inflection, cf.

image

In the above it works (in that the parenthetical '(i.e. a bee)' is not part of the espt output, though we ought to do something about the extra space), but the intention was that this would be achieved by specifying in the crk dictionary DB the special variant of the definition that would be passed onto phrase translation - so that this need not be done by the code.

We'd have to consult whether this is indeed done, e.g. for the above case, in the DB, and whether one makes use of whatever the name of that specific field is that is intended for phrase translation.

aarppe avatar Apr 19 '22 18:04 aarppe

And for the 'mîcisow' case, the current implementation does seem to work, but we need to check whether it's achieved on the coding or DB side, cf.

image

aarppe avatar Apr 19 '22 18:04 aarppe