wezterm win32 encoding: fix more inconsistencies

While looking into #2196 I was able to find some more inconsistencies in the win32 encoding. I discovered the following differences:

CTRL + ALT + key combinations should be sent with the same character code that would be generated by AltGr + key, i.e. either \x00 or a character depending on the current layout. I fixed the first case for now, which should fix #2196 in most cases.
AltGr + key should be sent with character code \x00 like above if that combination has no associated character. Right now wezterm falls back on the character code of key.
SHIFT + ascii_letter sends the correct character, but misses the SHIFT flag. This is caused by the normalize_shift function. I don't know if this is intentional behaviour for non-windows systems, so I didn't want to change that for now.

Raw test data here

While we could try to implement all these different cases, I did some testing and realized the ToUnicode function would actually return the correct character in all these cases if the control keys weren't manually removed from the key set before, as it currently happens here: https://github.com/wez/wezterm/blob/d5060ec3c100c01157252381830c4f1d043cd84d/window/src/os/windows/window.rs#L2143-L2153

However, this would change the KeyEvents wezterm generates, possibly breaking existing configurations and keyboard encoding for allow_win32_input_mode = false.

One possible idea would be to add a new optional field to the KeyEvent structure which is only used when using the win32 encoding. What's your opinion on all this?

Jul 10 '22 03:07 kreudom

I'm not opposed to adding something like a win32_uni_char field alongside the scan_code field: https://github.com/wez/wezterm/blob/d7fa541745c48ae374072715c4439494d6d2d734/wezterm-input-types/src/lib.rs#L1076-L1078

to hold what we get from ToUnicode in this case.

However! ToUnicode has internal global state that makes it tricky to call multiple times with varying parameters: it can destroy or be surprisingly affected by dead key states, so we'd need to be careful to verify that things are working correctly! I think robust results might be to:

Save a pristine copy of the keys we get from the GetKeyboardState call
Later in that function, when we understand the dead key state, call ToUnicode with the pristine state to get the data we need. That may require following up with additional call(s) to replay/correct the dead key state afterwards. You can see some of the gymnastics that happens to probe the layout in the KeyboardLayoutInfo code.
We might be able to avoid the tricky parts: if they are only tricky while we are in the middle of dead key processing, then those key presses probably wouldn't make it to the terminal, so perhaps it is valid to only make this extra ToUnicode call in the non-dead-key code path?

Jul 10 '22 14:07 wez

The last commits are my shot at implementing this. Some notes:

I only call ToUnicode twice in the same case when it would have been called once anyway. This should ensure ToUnicode stays in the correct state.
I don't call ToUnicode when handling dead keys, I assume the probing process already identified the correct characters, since ToUnicode is called with all modifiers intact when probing.
There is some kind of race condition happening when two KeyEvents are emitted by an invalid dead key combination. Both keys sometimes arrived in the terminal swapped around.

I didn't look far into it, but I assume this happened because the first of the two keys was encoded with the default encoding and sent with pane.key_down while the second key was encoded with the win32 encoding and sent using pane.writer.write_all.

My changes now let both keys be encoded with win32 encoding, this seems to have fixed the problem. But I thought I'd let you know in case it happens again.

All in all, these changes make the win32 encoding a lot more accurate. There's still a few differences, mainly regarding dead key handling, but I'm not sure fixing those is worth the effort.

Jul 10 '22 23:07 kreudom

Thanks for your patience with this; I manually merged this, except for one dead key case that I recall as being important for the non-win32 input mode. We can follow up on that in a separate PR if that still causes problems!

Apr 16 '23 21:04 wez