terminal
terminal copied to clipboard
Pasting specific texts into vim (WSL) breaks some keys
Windows Terminal version
1.20.10303.0 and 1.19.10302.0
Windows build number
10.0.22621.0
Other Software
vim 9.1.31 in WSL nvim v0.9.5 in WSL
Steps to reproduce
- Open vim in WSL using Terminal 1.20.10303.0 or 1.19.10302.0. Version 1.18.10301.0 is unaffected.
- Enter insert mode by pressing i.
- Paste in the following text with right click or shift+insert: https://gist.githubusercontent.com/csdivad/8c1069a35b52fb806ee8e4484b38f1f3/raw/37616af9daee626df6e7065059de5d5d87d8c9ec/test.txt
This input is a base64 encoded chunk of Lorem Ipsum.
The issue only happens with specific inputs. I encountered it first when pasting another base64 encoded text, but that had sensitive data in it.
Then I played around with dd if=lorem.txt bs=1 count=... | base64 -w 180 until I found a configuration that also triggered the issue.
Expected Behavior
Escape, backspace and Function keys works properly after the paste. Only the pasted text appears in the editor.
Actual Behavior
You are now stuck in editing mode. Esc, backspace and other keys are now sending weird escape sequences.
After the last character a 1~ appears. Pressing esc inserts ^[, pressing backspace inserts ^?.
Git bisect suggests that this broke in revision efbfb12145da959f74c368cdfc92f7892ce724d5. I'm not sure why though.
@csdivad I think you should be able to escape from this state in vim by pressing Ctrl+C.
I have a little more information now. It looks like the bracketed paste sequences are getting corrupted somehow. The paste starts with the opening sequence (\e[200~), but only ends with part of the closing sequence (1~).
As a result, vim thinks you're still pasting content, so it won't process any command keys unless you manually terminate the paste. Ctrl+C is one way of doing that, but you could also enter the closing bracket sequence manually: Esc [ 2 0 1 ~.
The length of the paste seems to be a factor in the bug, because for every additional character I add to the pasted content, one more character is included in the closing bracket sequence, until it's sending the full sequence. At that point, additional content doesn't seem make any difference.
I think I know why it's failing now. There's assumedly a 4096 byte buffer somewhere, and the pasted content along with the bracketed paste sequences adds up to just over 4096 bytes. As a result, the closing bracket sequence is cut off, and what we end up receiving in the initial packet is something like this:
\e[200~ ... PASTED CONTENT ... \e[20
When the parser gets to that last escape, we have a heuristic that looks for a single escape, or a two byte Alt+key sequence, but this fits neither of those patterns, so it seems we just drop those characters!
Then when the next packet arrives, we get the remaining characters for the closing bracket (1~), but since the start of the sequence has been lost, it's now just treated as plain text.
The more I think about, the more convinced I am that what we're doing here is completely wrong, but I don't know what the correct solution would be.
https://github.com/microsoft/terminal/blob/ef96e225da6b0df496390eed9fe31dc7e434a939/src/terminal/parser/stateMachine.cpp#L2152-L2155
I wonder if it's sufficient to always switch into a normal VT state machine mode once we've encountered a Win32 Input Mode sequence, e.g.:
if (_isEngineForInput && !_engine->EncounteredWin32InputModeSequence())
{
Could we just amend the heuristic to also account for bracketed paste mode? Like, if we've started a bracketed paste, don't ever flush the state machine at the end of a write? Not sure how bad that would be.
Sorry, I think I've been misleading everyone. I've just been looking at this again, and I now realise that my initial understanding was wrong.
When the first packet ends with half an escape sequence (the \e[20), I thought it just dropped those characters, but that's not actually what's happening. They've already been parsed and are now "preserved" in the state machine, which is why it's not in the ground state. When the next packet arrives, it does actually continue parsing the sequence from where it left off, but that's the point when something goes wrong which I'm not sure I understand.
It gets to a point where it decides it should flush the content to the input queue. However, the only thing it can flush is what it has in the current buffer, which is the second half of the sequence. So I think that's how the first half ends up being dropped, and vim sees the second half of the sequence as additional pasted text.
I think it's best I leave this to someone else to figure out, because I seem to be out of my depth here.
Any update on this? The issue is now present in the Store release (1.19.10573.0).
I've just had this issue come up again in a different context (a VT query response that exceeded 4K), and I think I have a better understanding of what's going wrong now.
As I mentioned above:
It gets to a point where it decides it should flush the content to the input queue. However, the only thing it can flush is what it has in the current buffer, which is the second half of the sequence.
But the state machine already has a process for dealing with this. If you exit the state machine and you're not in the ground state, it can save the current run in the _cachedSequence field. Then if it needs to flush at a later point in time, it can combine the _cachedSequence with the latest run so you won't lose anything.
The problem is that this process is only applied to the output state machine. See here: https://github.com/microsoft/terminal/blob/0199ca33ddb4a02ca3549627d772ce6b330ed1ce/src/terminal/parser/stateMachine.cpp#L2073-L2084
So I think all we need to do is copy that chunk of code up a couple of lines into the _isEngineForInput branch and all will be good.
I've done a basic test with this patch, and it seems to fix my problem as well as the vim pasting bug described in this issue. I don't know if there are any edge cases I'm missing though.