keyman feat(windows): engine needs to preprocess U+000D U+000A to U+000A before passing into Core

I am anticipating that this will become a problem soon (encountered while working on the keyboard debugger).

I think this belongs under LDML keyboardprocessor. On Windows at least, \r\n should always be deleted as a block for K_BKSP.

We may need to special-case this, testing for the presence of this pair at the end of the context and requesting 2 back-deletions rather than 1.

(Note: I think this will become visible with the move from action queue to action struct in #10441, and become more obvious after #10415 is implemented.)

EDIT:

Principle -- Engine MUST preprocess context from compliant apps to convert \r\n to \n before supplying to Core, and then when emitting into compliant apps, do the inverse, \n to \r\n. Note the Keyman Developer debugger also needs to consider doing this.

Jan 23 '24 04:01 mcdurdin

@rc-swag assigning to you but happy to discuss on who owns.

Jan 25 '24 23:01 mcdurdin

Q: how does kmx handle this?

Is this going to be an issue for authors? that is, will keyboards see A\nB on some platforms and A\r\nB on others? dare I say, almost a normalization issue

Jan 26 '24 21:01 srl295

I don't follow when will there be a need to backspace over a \r\n? Internally in the core the context will be invalidated on a "carriage return". On the platform side for Context-aware/Compliant apps the set_if_needed will only have the context up to the start of line.

Jan 28 '24 22:01 rc-swag

Note: see edit in OP for change in perspective post discussion with Ross. Keeping the same issue for now.

Principle -- preprocess context from compliant apps to convert \r\n to \n, and then when emitting into compliant apps, do the inverse, \n to \r\n.

Jan 29 '24 02:01 mcdurdin

Q: how does kmx handle this?

Is this going to be an issue for authors? that is, will keyboards see A\nB on some platforms and A\r\nB on others? dare I say, almost a normalization issue

Yes we had the same discussion. Resolution is to normalize by Engine before passing into Core, given it applies mainly to Windows. Then keyboard authors will only ever see \n.

Jan 29 '24 03:01 mcdurdin

Internally in the core the context will be invalidated on a "carriage return". On the platform side for Context-aware/Compliant apps the set_if_needed will only have the context up to the start of line.

The context may include a \n -- some apps start a new context on a new para, others treat the entire text buffer as a single unit.

Jan 29 '24 03:01 mcdurdin

Note: we could add a hint to the keyboard compiler in the future to note that 0x0D will never be seen in context (noting that more work needs to be done on KMW for this to be the case).

Jan 29 '24 03:01 mcdurdin