apertium
apertium copied to clipboard
Make a postprocess to handle capitalisation
Capitalisation should not be done in transfer, it should be done in a postprocess, much like "recasing" in SMT.
At what stage exactly and on the basis of which information? I'm thinking about how dealing with the difference in French nouns like "allemand" (the language) and "Allemand" (a person). Currently, I do this in transfer.
@ftyers we can use secondary tags to propagate the case till the post generator and then apply it there if needed.
This is related: #75
@hectoralos I would do it in posttransfer using the LU and perhaps a 1-2 word context window.
@ftyers basically only using dictionary case and "is this a sentence end"-context and ignoring input case? We'd lose the ability to keep UPPER CASE and Titles with Titlecase but maybe that's worth the code simplification …
lt-proc
could record the original capitalization and put that in word-bound blanks which could then be used to determine that.
@mr-martian lt-proc
outputs the original word form anyway, so a separate step can do the job. I actually have a branch of nno-nob that just adds tags aa/Aa/AA
that way to all words (capstag.rlx
runs after morph ana/dis), removed again in transfer. I'm considering switching to this system so we can get dictionary-based correction but keep input caps (for start of sentence or where there are several upper-cased words in a row), but have to make sure it doesn't lead to regressions first.
Processor added in 7e7004d