Pasting strings into V86 with support for non-US keyboards
This is a draft proposal for a function that pastes strings into the guest by using the V86 keyboard with support for non-US keyboard layouts. This function is needed in different contexts:
- to paste a plaintext string from the system clipboard into V86
- to handle keyboard input in mobile browser environments
- in internal tests and other script-based V86 use cases
This function is currently supported through V86.keyboard_send_text(string, delay), but limited to the US keyboard layout.
Desktop browser keyboard input
Let's start with a deep look at how V86 handles keyboard input in a desktop browser environment and with some needed terms and concepts. A simplified view of the keyboard pipeline when V86 is used with a desktop browser can be depicted as:
DesktopKeyboard --> Scancode --> IRQ-1 --> ... --> GuestKeyboard
DesktopKeyboard represents the system keyboard state (the set of pressed keys), and GuestKeyboard the keyboard state as seen by the guest (GuestKeyboard is only a conceptual element which we don't implement). Our main task is to assert that the guest's keyboard state is always in sync with the system's keyboard state (except for key combinations that are intercepted by the OS or browser like Ctrl+Alt+Del).
A scancode is an 8- or 16-bit integer that represents the physical location of a key on a keyboard, a value that isn't altered by the visual keyboard layout or the state of the modifier keys. The 8th bit (0x80) of scancodes is reserved, when cleared it signals a "keydown" event, else a "keyup" event (these values are also called "make" and "break" codes, respectively).
Whenever the user presses or releases a key on the system keyboard its scancode and state is sent immediately to the V86 PS/2 controller and passed on to the guest's keyboard driver via interrupt. 16-bit scancodes are sent to the PS/2 controller as two separate bytes.
The desktop browser's KeyboardEvent.code property represents the same physical key location as a scancode but uses strings instead of integers to identify physical keys. "keydown" and "keyup" events are delivered in separate KeyboardEvent instances.
A 1:1 mapping between KeyboardEvent.code and scancodes exists which gives a well-defined and universal solution for this, independent of the visual keyboard layout.
This is how V86 handles this case, though there is also a fallback to KeyboardEvent.keyCode which only works for US-keyboards and has been deprecated in the standards 4 years ago or so, unless there are reasons to keep it I believe this fallback can be removed.
String-based keyboard input
Even though the keyboard pipeline can be used to "paste" plaintext strings into the guest, this method works very differently than the one described above. The pipeline looks like this while a string paste operation is in progress:
DesktopKeyboard --X
PasteKeyboard ----> Scancode --> IRQ-1 --> ... --> GuestKeyboard
DesktopKeyboard is muted here, meaning it still receives system keyboard events and updates its internal state but no scancodes are fed into the keyboard pipeline, the pipeline is instead fed by PasteKeyboard for the duration of the paste operation.
The paste operation should work approximately like this (the core concept in this proposal):
- At the start of a paste operation the states of DesktopKeyboard and GuestKeyboard are in sync, so the initial state of PasteKeyboard is a copy of the state of DesktopKeyboard.
- PasteKeyboard and GuestKeyboard are then together transitioned into the "idle" state by generating appropriate scancodes that release all keys pressed and unlock CapsLock in case it is locked. Having GuestKeyboard in idle state simplifies the scancode generation that follows next.
- The stream of scancodes is generated from the plaintext string and passed to the guest. Before a key scancode is sent it may be neccessary to send additional scancodes that toggle the state of the Shift and/or AltGr modifier keys. Since we track the keyboard state in PasteKeyboard we can reduce modifier key changes to a minimum (modifier keys are toggled lazily in PasteKeyboard).
- When a key on the system keyboard is pressed or released while a paste operation is in progress then only the state of DesktopKeyboard gets updated, but it is not otherwise reacted upon. An exception could be made for the Escape key to abort a long-running paste operation, or even that any key press aborts an ongoing paste operation.
- At the end of a paste operation it may be neccessary to generate scancodes to bring GuestKeyboard into sync with DesktopKeyboard (our main task). The states of DesktopKeyboard and GuestKeyboard may diverge in any way during a paste operation but must be in sync after it has finished.
Several problems and limitations arise when pasting strings into the guest:
- Generating scancodes from unicode characters depends on the visual keyboard layout configured in the guest's keyboard driver. The set of plaintext characters that can be pasted into the guest is limited to that of the selected keyboard layout.
- For each symbol used in the guest keyboard's visual layout we need a mapping from the symbol's unicode character to a sequence of pairs of (scancode, modifiers), where modifiers is a bitset of the Shift and the AltGr modifier keys (I believe Ctrl and Alt are reserved for OS and application use, but I'm not 100% sure). Normally only a single pair is needed, but dead keys cause sequences of multiple pairs. There are hundreds of different keyboard layouts worldwide, a reasonable subset for V86 needs to be selected and should be extended on demand. I have implemented these mappings in repository keyboard-tables, it's thoroughly tested (also see the interactive keyboard mapping explorer).
- The number of generated scancode bytes for toggling a single symbol can vary widely between 1 and 5 (maybe even more, see examples below).
- Generated scancode bytes cannot be sent to the guest at full speed, their transmission must be throttled in order to avoid losing scancodes due to a possible bottleneck in the guest's IRQ handler. I would suggest to send whole scancodes in bursts of maybe 8, 10 or 15 bytes and to pause somewhere between 50-100ms between bursts to give the guest time to digest the input. By "whole scancodes" I mean to not break up 16-bit scancodes across adjacent bursts. Burst size (in bytes) and delay (in milliseconds) should be configurable.
- A paste operation has a duration that depends on length and content of the string that is pasted, which may last for many seconds or even minutes (there's no real limit due to the unlimited string length). Only one paste operation may be active at a time.
Example scancode sequences for different keys and keyboard layouts:
"a" on a US keyboard generates 2 scancode bytes:
1E 9E
| |
| ~KeyA
KeyA
"Á" on a UK keyboard (Shift+AltGr+"A") generates 8 bytes:
2A E038 1E 9E AA E0B8
| | | | | |
| | | | | ~AltGr
| | | | ~Shift
| | | ~KeyA
| | KeyA
| AltGr
Shift
"Á" on a Swiss keyboard (dead key AltGr+"-" followed by Shift+"A") generates 10 bytes:
E038 0C 8C 2A E0B8 1E 9E AA
| | | | | | | |
| | | | | | | ~Shift
| | | | | | ~KeyA
| | | | | KeyA
| | | | ~AltGr
| | | Shift
| | ~Minus
| Minus
AltGr
Summary
This proposal is meant to define a robust and consistent paste function for V86 (I came up with this all by myself), let me know if it's too much :)
There is actually very little overlap between DesktopKeyboard and PasteKeyboard, they are backed by very different data structures and algorithms and only share how they keep track of their current keyboard state (the set of currently pressed keys).
Thanks for reading, any feedback is (as always) highly appreciated.
See branch chschnell:keyboard-i18n for the modified KeyboardAdapter implementation in src/browser/keyboard.js.
This is not fully finished and will need more testing, but the important things work, that is: the desktop browser keyboard works (as it did before, just refactored), and the new paste-text function works, too, but now with support for non-US keyboards and muting the desktop keyboard while pasting etc.
Anybody know an easy way to test this for mobile browsers?
Anybody know an easy way to test this for mobile browsers?
I used ngrok for testing on my devices. Android Emulator with an on-screen keyboard should probably work too, but I haven't used it.
Anybody know an easy way to test this for mobile browsers?
I used ngrok for testing on my devices. Android Emulator with an on-screen keyboard should probably work too, but I haven't used it.
Thank you.
Turns out I can simply point my Android mobile (with current Firefox Mobile) to my local Apache server even though I only have a cheap self-signed certificate installed there.
But the keyboard doesn't work properly, the current master and my local installation show the same problem: It works fine for digits 1-9 and special characters like !"$%&/()=? but fails when I type a-z or A-Z, it doesn't respond to the keys.
This can be tested at copy.sh/v86.
Any idea?
Hmm, what keyboard (and layout) do you using? For me, letters, digits and special chars work properly on Firefox 142 and Gboard (on copy.sh and your branch):
P.S. There is a bug in your branch (https://github.com/chschnell/v86/commit/6271ac5b2c7e2ac885da49fc7b2e75f6493e1d4d): Backspace and Enter do not work.
Hmm, what keyboard (and layout) do you using? For me, letters, digits and special chars work properly on Firefox 142 and Gboard (on copy.sh and your branch):
I've now tested with Chrome on my mobile, too, and it fails the same way as Firefox.
The mobile device I use here:
- Galaxy A33 5G
- Android Version 15 (One-UI Version 7.0)
Keyboard settings are all at their defaults (the entire device is), specifically:
- Layout: Standard
- Keyboard: Samsung Keyboard Version 5.9.12.10
Though I'm not sure if this is what you mean with Layout and Keyboard. EDIT: Everything is configured to the German locale, but an "a" is an "a", afterall.
P.S. There is a bug in your branch (chschnell@6271ac5): Backspace and Enter do not work.
Thanks, will look into it, but I'm not there yet :)
Though I'm not sure if this is what you mean with Layout and Keyboard.
I'd recommend testing your keyboard at https://w3c.github.io/uievents/tools/key-event-viewer.html (in Chrome and Firefox), for example: https://github.com/copy/v86/issues/262#issuecomment-2587161994.
I also found a bug on desktop Chrome and Firefox (Windows), the keys don't work:
v86_all.js?98e7110c2:149 Missing char in map: keyCode=54 code=KeyT
v86_all.js?98e7110c2:149 Missing char in map: keyCode=54 code=KeyT
v86_all.js?98e7110c2:149 Missing char in map: keyCode=45 code=KeyE
v86_all.js?98e7110c2:149 Missing char in map: keyCode=45 code=KeyE
v86_all.js?98e7110c2:149 Missing char in map: keyCode=53 code=KeyS
v86_all.js?98e7110c2:149 Missing char in map: keyCode=53 code=KeyS
v86_all.js?98e7110c2:149 Missing char in map: keyCode=54 code=KeyT
v86_all.js?98e7110c2:149 Missing char in map: keyCode=54 code=KeyT
EDIT: Everything is configured to the German locale, but an "a" is an "a", afterall.
Is your keyboard not setting some kind of diaeresis symbol (or something similar)?
Though I'm not sure if this is what you mean with Layout and Keyboard.
I'd recommend testing your keyboard at https://w3c.github.io/uievents/tools/key-event-viewer.html (in Chrome and Firefox), for example: #262 (comment).
Here is what I got when pressing the A key:
Not sure what to make of it, is the Unidentified the problem?
I also found a bug on desktop Chrome and Firefox (Windows), the keys don't work:
v86_all.js?98e7110c2:149 Missing char in map: keyCode=54 code=KeyT v86_all.js?98e7110c2:149 Missing char in map: keyCode=54 code=KeyT v86_all.js?98e7110c2:149 Missing char in map: keyCode=45 code=KeyE v86_all.js?98e7110c2:149 Missing char in map: keyCode=45 code=KeyE v86_all.js?98e7110c2:149 Missing char in map: keyCode=53 code=KeyS v86_all.js?98e7110c2:149 Missing char in map: keyCode=53 code=KeyS v86_all.js?98e7110c2:149 Missing char in map: keyCode=54 code=KeyT v86_all.js?98e7110c2:149 Missing char in map: keyCode=54 code=KeyT
Sorry I can't follow, desktop?
EDIT: Everything is configured to the German locale, but an "a" is an "a", afterall.
Is your keyboard not setting some kind of diaeresis symbol (or something similar)?
Yes, Ä, Ö and Ü are on separate keys, see here for a picture. I would howver expect the latin characters to work like the digits.
Not sure what to make of it, is the
Unidentifiedthe problem?
When v86 detects Unidentified, it skips the keyup handler and the event triggers the input handler:
https://github.com/copy/v86/blob/0669f7a4774f9d575392ef36aacddd69f3345425/src/browser/keyboard.js#L431-L435
Out of interest, can you leave just the input handler (i.e. remove these lines) and test it?
Sorry I can't follow, desktop?
After running make all all-debug run, I opened localhost:8000 and started the emulator. When I press any key (with English system's keyboard layout), I get Missing char in map: <...>:
In debug mode, the keyboard works properly:
Sorry I can't follow, desktop?
After running
make all all-debug run, I openedlocalhost:8000and started the emulator. When I press any key (with English system's keyboard layout), I getMissing char in map: <...>:
Ui, that was a big bug, thanks for pointing me right at it!
Fixed in latest commit.
I'll debug the separate Android keyboard issue in the next couple of days.
P.S. There is a bug in your branch (chschnell@6271ac5): Backspace and Enter do not work.
I believe to have fixed that with the latest commit, if you find a minute, can you confirm?
I believe to have fixed that with the latest commit, if you find a minute, can you confirm?
Thanks, the bug on release build has been fixed. About mobile browser, Backspace and Enter still don't work (on Chrome and Firefox), same behavior: works in debug but not in release build.
Seems that send_raw_scancodes received undefined keycodes (yes, Closure Compiler, I know), so these keys don't work properly:
diff --git a/src/browser/keyboard.js b/src/browser/keyboard.js
index c1a334f1..85eb5035 100644
--- a/src/browser/keyboard.js
+++ b/src/browser/keyboard.js
@@ -849,10 +849,10 @@ export function KeyboardAdapter(bus, options)
data_keyboard.send_plaintext(e.data);
break;
case "insertLineBreak":
- data_keyboard.send_raw_scancodes([SCANCODE.Enter, SCANCODE.Enter | SCANCODE_RELEASE]);
+ data_keyboard.send_raw_scancodes([SCANCODE["Enter"], SCANCODE["Enter"] | SCANCODE_RELEASE]);
break;
case "deleteContentBackward":
- data_keyboard.send_raw_scancodes([SCANCODE.Backspace, SCANCODE.Backspace | SCANCODE_RELEASE]);
+ data_keyboard.send_raw_scancodes([SCANCODE["Backspace"], SCANCODE["Backspace"] | SCANCODE_RELEASE]);
break;
}
}
Are modifier keys used for the standard English layout? Maybe there is the same "silly" problem as here?
Yes! In the latest commit I replaced all SCANCODE.X with SCANCODE["X"], there were many such uses.
The problem with my mobile device is solved, after playing around with it a bit more I noticed that I had to disable the auto-suggestions to make it work (a bit of a blunder from my side, I must admit).
It works completely now, including Backspace and Enter.