apertium icon indicating copy to clipboard operation
apertium copied to clipboard

Make a postprocess to handle capitalisation

Open ftyers opened this issue 4 years ago • 7 comments

Capitalisation should not be done in transfer, it should be done in a postprocess, much like "recasing" in SMT.

ftyers avatar Mar 09 '20 01:03 ftyers

At what stage exactly and on the basis of which information? I'm thinking about how dealing with the difference in French nouns like "allemand" (the language) and "Allemand" (a person). Currently, I do this in transfer.

hectoralos avatar Mar 09 '20 07:03 hectoralos

@ftyers we can use secondary tags to propagate the case till the post generator and then apply it there if needed.

khannatanmai avatar May 12 '20 21:05 khannatanmai

This is related: #75

ftyers avatar Jul 03 '20 16:07 ftyers

@hectoralos I would do it in posttransfer using the LU and perhaps a 1-2 word context window.

ftyers avatar Jul 03 '20 16:07 ftyers

@ftyers basically only using dictionary case and "is this a sentence end"-context and ignoring input case? We'd lose the ability to keep UPPER CASE and Titles with Titlecase but maybe that's worth the code simplification …

unhammer avatar Apr 25 '21 17:04 unhammer

lt-proc could record the original capitalization and put that in word-bound blanks which could then be used to determine that.

mr-martian avatar Apr 25 '21 18:04 mr-martian

@mr-martian lt-proc outputs the original word form anyway, so a separate step can do the job. I actually have a branch of nno-nob that just adds tags aa/Aa/AA that way to all words (capstag.rlx runs after morph ana/dis), removed again in transfer. I'm considering switching to this system so we can get dictionary-based correction but keep input caps (for start of sentence or where there are several upper-cased words in a row), but have to make sure it doesn't lead to regressions first.

unhammer avatar Apr 25 '21 18:04 unhammer

Processor added in 7e7004d

mr-martian avatar Dec 22 '22 21:12 mr-martian