feat(windows): engine needs to preprocess U+000D U+000A to U+000A before passing into Core
I am anticipating that this will become a problem soon (encountered while working on the keyboard debugger).
I think this belongs under LDML keyboardprocessor. On Windows at least, \r\n should always be deleted as a block for K_BKSP.
We may need to special-case this, testing for the presence of this pair at the end of the context and requesting 2 back-deletions rather than 1.
(Note: I think this will become visible with the move from action queue to action struct in #10441, and become more obvious after #10415 is implemented.)
EDIT:
Principle -- Engine MUST preprocess context from compliant apps to convert \r\n to \n before supplying to Core, and then when emitting into compliant apps, do the inverse, \n to \r\n. Note the Keyman Developer debugger also needs to consider doing this.
@rc-swag assigning to you but happy to discuss on who owns.
Q: how does kmx handle this?
Is this going to be an issue for authors? that is, will keyboards see A\nB on some platforms and A\r\nB on others? dare I say, almost a normalization issue
I don't follow when will there be a need to backspace over a \r\n? Internally in the core the context will be invalidated on a "carriage return". On the platform side for Context-aware/Compliant apps the set_if_needed will only have the context up to the start of line.
Note: see edit in OP for change in perspective post discussion with Ross. Keeping the same issue for now.
Principle -- preprocess context from compliant apps to convert \r\n to \n, and then when emitting into compliant apps, do the inverse, \n to \r\n.
Q: how does kmx handle this?
Is this going to be an issue for authors? that is, will keyboards see
A\nBon some platforms andA\r\nBon others? dare I say, almost a normalization issue
Yes we had the same discussion. Resolution is to normalize by Engine before passing into Core, given it applies mainly to Windows. Then keyboard authors will only ever see \n.
Internally in the
corethe context will be invalidated on a "carriage return". On the platform side for Context-aware/Compliant apps theset_if_neededwill only have the context up to the start of line.
The context may include a \n -- some apps start a new context on a new para, others treat the entire text buffer as a single unit.
Note: we could add a hint to the keyboard compiler in the future to note that 0x0D will never be seen in context (noting that more work needs to be done on KMW for this to be the case).