vte icon indicating copy to clipboard operation
vte copied to clipboard

Add support for custom parsing of APC, SOS and PM sequences.

Open boazy opened this issue 1 year ago • 29 comments
trafficstars

Fixes #109.

This attempts the same goal at John-Toohey's PR #110, but since this PR was was not accepted, I wanted to try another approach and see whether it is acceptable.

Rationale for support

APC sequences are rarely supported by general-purpose terminals (if we put aside Kermit clients, tmux and screen), but there is one exception: The Kitty Terminal Graphics Protocol. While the Kitty Image Protocol is not as ubiquitous as Sixel, it is more robust, accurate and powerful and it's implemented by several prominent terminal emulators:

  • Kitty (obviously)
  • Konsole
  • Wezterm
  • Wayst
  • Ghostty

It's also supported by a large number of programs and libraries, some of them are mentioned here.

Design

Design Goals

  • Support SOS, PM and APC sequences (just like the original PR)
  • Support streaming parsing of SOS, PM and APC sequence payloads.
  • Avoid adding new conditional branches in existing code paths. This should minimize performance impact on existing code.
  • Avoid re-purposing OSC state variables (in order to reduce complexity)
  • Avoid adding any new stored state variable
  • No embedded parsers - use existing state change tables for parsing

Design Choices

Reorder actions into packable and non-packable actions

Currently, resulting state and actions are packed into an 8-bit value in the state change table (with a 4-bit nibble allocated for each). All values from 0 to 15 for both state and actions are used, so I could not add new states and action that would be packable. Unfortunately, I needed to support a state change with the OpaquePut action. The best way I've found is to separate the integer values of actions that do not need to be packed into the state change table (such as Hook, UnhookandClear`) into a value higher than 15 and reserve lower values for actions that need to be packed

I'm not sure if my current solution is acceptable or not but I have no other idea on how to resolve this situation without repurposing the OscString state and actions and re-introducing an embedded parser.

Action matching order

This should have minimal impact, but in order to avoid any performance impact for programs which do not use APC sequences, I made sure that when matching APC/SOS/PM-specific actions, they are always matched last.

Streaming

Unlike #110, this PR does not collect the sequence payload inside an array. Instead, the sequence payload is streamed directly to the Perform trait, one byte at a time. I believe this approach is more flexible and would lead to better performance overall. The key gains here are:

  • Avoiding extra vector allocation for the payload (I don't want to re-use osc_raw and in any case we would have to increase its default size to 4096 to be compatible with the max payload size).
  • Avoiding unnecessary copying when. The payload often just needs to be parsed directly. This is especially useful when sending images directly, since the image data would often just be forwarded to an image format parser anyway.

Names

I've changed SosPmApcString to OpaqueString, since I wanted to have a general name for all these types of sequences where the VTE parser has no understanding of the internal structure of the sequence payload. This is difference from OSC and CSI sequences where the VTE parser is aware of the high-level structure of the sequence (if not the semantics).

boazy avatar Aug 04 '24 14:08 boazy

I'll give a detailed review later today/tomorrow, but assuming that this was benchmarked it seems like a reasonable approach.

I ran cargo bench but I'm a bit unsure on how to run the full vtebench suite. Is there a guide for that?

I don't expect any performance impact since I've added no new state and I've only added tailed branches to an already-long match expression, but you never know until you test...

boazy avatar Aug 05 '24 04:08 boazy

You can open a PR against alacritty with your changes, so we can use our bot for it. But generally you just build alacritty with your changes and run the vtebench.

kchibisov avatar Aug 05 '24 09:08 kchibisov

You can open a PR against alacritty with your changes, so we can use our bot for it. But generally you just build alacritty with your changes and run the vtebench.

Thanks! Here are the results:

Master Branch:

  dense_cells (625 samples @ 1 MiB):
    15.47ms avg (90% < 16ms) +-1.93ms

  scrolling (199 samples @ 1 MiB):
    37.19ms avg (90% < 42ms) +-4.62ms

  scrolling_bottom_region (361 samples @ 1 MiB):
    27.25ms avg (90% < 28ms) +-2.28ms

  scrolling_bottom_small_region (369 samples @ 1 MiB):
    26.69ms avg (90% < 27ms) +-2.3ms

  scrolling_fullscreen (185 samples @ 1 MiB):
    41.15ms avg (90% < 45ms) +-2.28ms

  scrolling_top_region (247 samples @ 1 MiB):
    40.09ms avg (90% < 42ms) +-2.92ms

  scrolling_top_small_region (364 samples @ 1 MiB):
    27.06ms avg (90% < 28ms) +-2.27ms

  unicode (611 samples @ 1.06 MiB):
    15.86ms avg (90% < 17ms) +-1.55ms

This PR:

  dense_cells (606 samples @ 1 MiB):
    16ms avg (90% < 17ms) +-1.26ms

  scrolling (198 samples @ 1 MiB):
    37.39ms avg (90% < 42ms) +-5.06ms

  scrolling_bottom_region (361 samples @ 1 MiB):
    27.25ms avg (90% < 28ms) +-2.12ms

  scrolling_bottom_small_region (362 samples @ 1 MiB):
    27.19ms avg (90% < 28ms) +-2.25ms

  scrolling_fullscreen (182 samples @ 1 MiB):
    41.63ms avg (90% < 45ms) +-3.56ms

  scrolling_top_region (250 samples @ 1 MiB):
    39.6ms avg (90% < 41ms) +-2.85ms

  scrolling_top_small_region (362 samples @ 1 MiB):
    27.16ms avg (90% < 28ms) +-1.28ms

  unicode (654 samples @ 1.06 MiB):
    14.79ms avg (90% < 16ms) +-2.14ms

The scrolling_* benchmarks are unstable and give me different lead every time. Unicode seems to be consistently slightly faster on this PR while dense_cells is slightly faster on master branch, but the difference is quite small and the variance is higher.

boazy avatar Aug 05 '24 10:08 boazy

@boazy just open a PR against alacritty's repo, it should better for us to review that way, since we can run the bot.

kchibisov avatar Aug 05 '24 11:08 kchibisov

@kchibisov I've added a pull request. I'm not sure how to trigger the benchmarks bot.

boazy avatar Aug 07 '24 06:08 boazy

I'm not sure how to trigger the benchmarks bot.

You can't. I did.

chrisduerr avatar Aug 07 '24 07:08 chrisduerr

@boazy This has stalled out for a while now, are you still interested in working on this? I'm slightly concerned because the longer we wait, the higher the chance that a vte rewrite will make all your work worthless.

chrisduerr avatar Sep 21 '24 22:09 chrisduerr

I would expect three sets of functions. How that is done is an implementation detail irrelevant to the consumer (as long as it's fast).

@boazy This has stalled out for a while now, are you still interested in working on this? I'm slightly concerned because the longer we wait, the higher the chance that a vte rewrite will make all your work worthless.

I didn't have time for a while, but I've had some time again today and I've checked your suggestion. Unfortunately, I don't think it would be possible without completely re-architecting the way state works. As I've mentioned before, we only have 4 bits for actions that change the state, but we also only have 4 bits for the state itself, and here all possible values (0-15) are used.

If we want to support three sets of functions, we would need to track three separate states for each type of string (SOS, PM, APC). This is not possible without either:

  1. Adding a new state variable for tracking the type of opaque string. (with possible performance impact)
  2. Modifying the state variable to a u16 or better yet something like struct ResultingState(State, Action). Would this impact performance? I don't know, but it's also a big change I'm a little bit uneasy with...

The only thing I can do without any major change is to have 3 different functions instead of opaque_start, but opaue_end and opaque_put would have to remain a single function in this case.

boazy avatar Oct 01 '24 02:10 boazy

Just to recap because this is quickly fading into obscurity and I don't want to forget about it entirely:

Splitting opaque_start up would be no problem, since we have access to the three states anyway, but splitting opaque_put and opaque_end is a problem because it means we'd have to keep track of the state we're currently in?

But couldn't you just store that on Parser (the size of which is irrelevant), then branch inside Action::OpaquePut => and Action::OpaqueEnd => since doing so should leave performance of the other branches unaffected?

I believe that's kinda your first suggestion? Why do you think this would impact performance?

chrisduerr avatar Oct 11 '24 23:10 chrisduerr

But couldn't you just store that on Parser (the size of which is irrelevant), then branch inside Action::OpaquePut => and Action::OpaqueEnd => since doing so should leave performance of the other branches unaffected?

I believe that's kinda your first suggestion? Why do you think this would impact performance?

I honestly don't know. I assumed the size is important for performance since we want to be able to have all state transitions in one table (which only has 256 entries?). If we go back to the start, I created this PR trying to answer the comments to the original PR that tried to add APC support (#110), particularly this comment:

This basically feels like an embedded parser. Most of our parser is structured in a way to stream through bytes and process them as they're coming in, however rather than creating state transitions you're storing a state transition to later then match on it instead. VTE is already too slow and I'm afraid this will just amplify that problem.

https://github.com/alacritty/vte/pull/110#discussion_r1545286222

I realize I can add additional state that doesn't affect OSC sequences, but seeing this comment I was a bit worried. Do you believe adding an additional field to Parser (let's say opaque_sequence_kind) should be ok?

boazy avatar Oct 16 '24 11:10 boazy

I honestly don't know. I assumed the size is important for performance since we want to be able to have all state transitions in one table (which only has 256 entries?). If we go back to the start, I created this PR trying to answer the comments to the original PR that tried to add APC support (#110), particularly this comment:

Yeah that's fair. This would make this a slow sloppy implementation and it might not be worth adding it for that reason. But that applies to your current solution too and the only alternative is a bigger rework that integrates this as a core part of our parser.

I realize I can add additional state that doesn't affect OSC sequences, but seeing this comment I was a bit worried. Do you believe adding an additional field to Parser (let's say opaque_sequence_kind) should be ok?

I think we should either do this, or go for something that properly integrates everything into our standard processing pipeline, which of course is difficult without affecting performance. I'm very hesitant to accept this sort of "second rate escape sequence", but I would prefer that over the current implementation.

chrisduerr avatar Oct 16 '24 18:10 chrisduerr

I think we should either do this, or go for something that properly integrates everything into our standard processing pipeline, which of course is difficult without affecting performance. I'm very hesitant to accept this sort of "second rate escape sequence", but I would prefer that over the current implementation.

Ok, I think I've understood you now. I'll modify the pull request along these lines.

boazy avatar Oct 17 '24 05:10 boazy

Going it tenatively approve the current revision, I don't think there's any other changes necessary from your end.

While I'm not entirely convinced I'll give this a closer look and if there are no performance regressions it should be fine.

chrisduerr avatar Oct 21 '24 13:10 chrisduerr

@boazy Just a heads-up: Once https://github.com/alacritty/vte/pull/118 is merged, it might make some sense to reinvestigate this. There's certainly been some changes and we do have a lot of extra free states/actions that could be made use of without affecting performance.

chrisduerr avatar Dec 20 '24 21:12 chrisduerr

We've reworked vte a bit, so probably would be easier to add things without hitting perf, since all the code that caused problem here is gone now.

kchibisov avatar Jan 13 '25 15:01 kchibisov

Hello! Just wanted to re-ping this thread to see what progress was being made here. We'd really like to support Kitty Image protocol in the Warp terminal, and this PR would help us get there

heysweet avatar Feb 03 '25 18:02 heysweet

This PR was languishing for a while now and now, and now that vte is ready, I probably need to rewrite it I guess, but I didn't have time to revisit this recently. I'll try to find time to do it.

The good news is that it should impact performance. I hope.

boazy avatar Feb 04 '25 17:02 boazy

Hello again, sorry to be a bother, just wanted to check in on the status of this. We’re kicking off the full-time effort to support kitty image protocol starting this week, and just want to get an understanding how much of a risk this PR is to our project. Thank you!

heysweet avatar Mar 06 '25 13:03 heysweet

@chrisduerr , @kchibisov I've finally got time to rebase the PR to work with the new version. I haven't written any unit tests yet, since I want to see if you are ok with the direction.

Some points to take note of:

  1. I'm currently not keeping any extra state for opaque sequence parsing, but that is introducing some limitations, which I will describe below.
  2. There is no clear specification on how to handle any character besides 0x20-0x7F, 0x9C (C1 ST) and the ESC ST sequence (0x1B 0x5C) in APC/SOS/PM sequences. The original DEC VT* terminals seem to have ignored these sequences completely, so essentially everything until the ST sequence (which put the terminal back in ground state) was ignored.
  3. What characters are valid inside an opaque sequence are also not clear and different terminals have different implementations. I went with supporting any non-control character, but that means that C1 ST (0x9C) currently does not terminate an opaque sequence. I didn't check all terminals who implement APC for Kitty protocol features, but Kitty and Ghostty do not seem to support C1 ST as well. In any case, I believe we should only implement C1 ST support if we support the C1 APC/SOS/PM codes for starting the sequence (which vte does not support), so this behavior is probably correct.
  4. On the other hand, the opaque sequence can be terminated with non-standard bell (0x07), since kitty supports bell-terminated sequences, and the main usage for these sequences seem to be implementing Kitty protocol features anyway).
  5. It is unclear how to handle escape sequences other than ST (0x1b 0x5c) inside an SOS/APC/PM string. My current approach is to terminate the sequence on any escape code, just as we vte does with OSC sequences, but this is not the most correct approach. A better approach would be to ignore these sequences, but the requires keeping more state.
  6. I've tried to inline everything, and avoid both code duplication and unnecessary branching (by using the OpaqueDispatch trait) wherever possible.

There are two approaches I can see for keeping track of additional state (particularly important for item #6):

  1. Add 3 additional states: SOSEscape, APCEscape and PMEscape. I will add special advance_esc implementation for each of these states, which ignore any other ESC sequence apart from ESC-\ (ST).
  2. Add an additional field to the parser that specifies the current opaque sequence mode (or none if no such mode exists). This field will be checked every time we handle an escape sequence, so it may impact the performance of non-SOS/APC/PM usage.

I believe the first approach should be better.

@heysweet Sorry for the time it took me to rebase this PR, I basically had to rewrite it from scratch.

boazy avatar Mar 07 '25 08:03 boazy

On the other hand, the opaque sequence can be terminated with non-standard bell (0x07)

Note that most terminals only accept BEL as a terminator for OSC sequences, strictly for backwards compatibility. In other string sequences it's just another data character. Kitty only started accepting it as a terminator for all string sequences a couple of years ago - I don't know why. Personally I don't think it's a good idea to encourage non-standard usage like that.

j4james avatar Mar 08 '25 00:03 j4james

One more point of concern: [sos/apc/pm]_dispatch() is called individually for each byte. For any use case which is not a byte-by-byte parser, it would probably be more efficient to call it on byte slices instead (I believe this was the entire point of #118). The main issue with calling this on byte slices, is that the SOS/APC/PM sequences can be arbitrarily long; indeed, with the the most common use case (Kitty Image Protocol) these sequences could easily be several megabytes long. I am a little bit unsure how accumulating these sequence in a Vec<u8> (like what the parser is doing for OSC) would impact performance.

boazy avatar Mar 08 '25 23:03 boazy

with the the most common use case (Kitty Image Protocol) these sequences could easily be several megabytes long.

Assuming apps actually follow the spec, then kitty image sequences should never be more than 4K ("the pixel data must first be base64 encoded then chunked up into chunks no larger than 4096 bytes"). But otherwise you're right - in theory these string sequences can be infinitely large.

j4james avatar Mar 09 '25 00:03 j4james

with the the most common use case (Kitty Image Protocol) these sequences could easily be several megabytes long.

Assuming apps actually follow the spec, then kitty image sequences should never be more than 4K ("the pixel data must first be base64 encoded then chunked up into chunks no larger than 4096 bytes"). But otherwise you're right - in theory these string sequences can be infinitely large.

Makes sense. I wasn't aware of the actual Kitty Image Protocol spec so I've missed that. VTE could also implement its own chunking and make that configurable in the parser.

boazy avatar Mar 09 '25 00:03 boazy

The perf looks pretty much the same https://github.com/alacritty/alacritty/pull/8507

Can not really answer wrt design now questions you've raised.

kchibisov avatar Mar 11 '25 18:03 kchibisov

Just to be clear on this:

[sos/apc/pm]_dispatch() is called individually for each byte. For any use case which is not a byte-by-byte parser, it would probably be more efficient to call it on byte slices instead (I believe this was the entire point of #118). The main issue with calling this on byte slices, is that the SOS/APC/PM sequences can be arbitrarily long; indeed, with the the most common use case (Kitty Image Protocol) these sequences could easily be several megabytes long. I am a little bit unsure how accumulating these sequence in a Vec (like what the parser is doing for OSC) would impact performance.

I personally don't mind this "simplistic" approach because I have very little interest in sos/pm/apc escape sequences, so this way there is only a minimal amount of impact on Alacritty while VTE is kept simple. That's also why I don't care if these escapes are dispatched particularly efficiently or not, since ideally nobody every uses them.

That said, the collection into an (Array)Vec in VTE is essentially a solved issue. We have this for OSCs already where things are somewhat dynamically limited and having a matching/similar approach probably wouldn't be a huge deal for these escapes either. Ideally to ensure parser state doesn't get too big one would likely reuse the same byte buffers for both, since it's not possible to have two different escape sequences at the same time anyway.

But going for something like that is likely to slow down the implementation process, since it doesn't sound like that's something you're particularly familiar with. So I don't mind just going with the status quo in this patch.

chrisduerr avatar Mar 12 '25 01:03 chrisduerr

That said, the collection into an (Array)Vec in VTE is essentially a solved issue. We have this for OSCs already where things are somewhat dynamically limited and having a matching/similar approach probably wouldn't be a huge deal for these escapes either. Ideally to ensure parser state doesn't get too big one would likely reuse the same byte buffers for both, since it's not possible to have two different escape sequences at the same time anyway.

Familiarity is not my issue here. I do understand how to collect bytes into a Vec (I hope). I thought of reusing osc_raw, but there may be several (minor) performance implications for Alacritty (and any other user of Parser which ignores SOS/PM/APC sequences):

  • If SOS/PM/APC sequences do get sent by a program, they'll be written to a buffer and then just get ignored later. In theory, this would impact performance.
  • Kitty Image Protocol sequences are still bigger than typical APM sequences, even with 4096 byte chunking (together with all the parameters that come before the payload, the actual chunk size would be a bit larger than 4096 bytes). If any program sends these sequences by mistake, the Vec would grow to that size and stay there (the non-std implementation will not be impacted, if we keep MAX_OSC_RAW the same).

I don't think these two issues are big in practice. Programs that send Kitty Image Protocol sequences to a terminal that doesn't support that are already misbehaving, and 4096 or 8192 bytes of extra memory per terminal are not a big thing. I also don't think cache locality would be impacted, since the shorter OSC sequences would keep using just the beginning of the buffer.

I will create another branch and a draft PR with a buffered version for comparison.

boazy avatar Mar 12 '25 02:03 boazy

On a second thought, I'm not sure if buffering opaque sequences will be such a great boon for clients of VTE anyway. Just like VTE itself is inlining methods, clients can just choose to inline their apc_put() implementation and put everything inside a Vec. So I think I would be okay with this implementation.

boazy avatar Mar 12 '25 03:03 boazy

On a second thought, I'm not sure if buffering opaque sequences will be such a great boon for clients of VTE anyway. Just like VTE itself is inlining methods, clients can just choose to inline their apc_put() implementation and put everything inside a Vec. So I think I would be okay with this implementation.

If the goal was to write the ideal implementation as far as performance goes, we'd probably special-case the OSC state to look for its termination sequence using memchr. However while that would speed up opaque strings, it would probably reduce the non-opaque string escape parsing slightly, so I don't think it's worth it.

chrisduerr avatar Mar 13 '25 20:03 chrisduerr

Added some small comments, but didn't want to actually do a review yet. Please just re-request one once you think you're done with your changes.

chrisduerr avatar Mar 13 '25 20:03 chrisduerr