chafa
chafa copied to clipboard
sixels: last line cut/truncated on terminal emulators with "correct" text cursor placement
Sixel capable terminal emulators have gotten cursor placement (after emitting the sixel) wrong since the beginning. They usually put the cursor on a new line under the sixel. This means the terminal content may scroll, if a sixel is printed on the last row.
However, it's not how the VT340 did it. The simplified explanation is that it places the cursor on the last line of the sixel. Thus, if you want to print text under the sixel, you first have to print a newline.
The real algorithm is slightly more complex than that. A sixel is 6 pixels tall. This means it can cover two text rows. The DEC cursor placement algorithm puts the text cursor where the top pixel is. This means there are times when two newlines are required to print text under the sixel.
A number of terminals have started to implement the correct behavior. Terminals that implement the DEC placement algorithm are foot, contour, DomTerm and WezTerm. There may be more that I'm not aware of. XTerm is close to correct, but last time I checked, it placed the cursor on the bottom pixel (i.e. you always need a single newline).
Right now, running chafa <image> && echo "XXXXXX" will look something like this in e.g. foot:
(picture shows a part of my dog's paw...)
A bit more information here:
- https://github.com/contour-terminal/contour/pull/825#issuecomment-1256607524
- https://github.com/hackerb9/vt340test/issues/25
- https://github.com/hackerb9/lsix/pull/51
Hi @dnkl. The VT340's algorithm should not be followed too strictly in modern terminals. Despite what the documentation implied, it uses a fast heuristic which relies on the character cell being 20 pixels tall. That algorithm was faster but included a glitch which should not be copied.
If you are designing a terminal that uses characters that are not 20 pixels tall, the algorithm does not apply and will have to be adapted in one of two ways:
- I suggest using the simpler algorithm which I believe you referred to as "bottom pixel". Such a terminal will be compatible with the way that programmers presumed the VT340 always worked and that DEC's documentation would lead a reasonable person to believe. All known original programs and sixel images for the VT340 will work correctly with that algorithm. Additionally, it is easily programmed for as one knows exactly how to draw text under any graphic: just send a newline.
- j4james has suggested limiting the sixel image resolution so that, regardless of the font resolution, each character cell shows only 10x20 pixels. This might be useful for someone who wants to run hypothetical VT340 software from thirty years ago which knows about and works around the VT340's cursor positioning quirk. One major downside is that graphical resolution is limited and the only way to increase it is to make the font so tiny it is unreadable.
I strongly believe the first method is the correct one for most modern terminals. It lets programmers easily create software that integrates graphics with character cell text interfaces, which is to me what makes sixels useful.
If you've read my discussion with j4james about whether this VT340 behavior is a "glitch", you'll see that even though he believes it is the historical behavior and thus correct for any terminal that claims to emulate a VT340, neither of us could come up with an easy solution for application programmers who want to just splat a sixel image on the screen and show some text underneath it. Since a workaround requires the application to model the internal state of the VT340, no sane program will ever intentionally use this odd behavior, whether it is technically a glitch or not.
@hackerb9 I don't mind changing foot to always put the cursor on the last row touched by the sixel (i.e. the bottom pixel of the last sixel).
What I don't want is slightly different behavior in modern terminals, and I was under the impression that the other "correct" terminals also followed the DEC algorithm? If not, I'd be more than happy to update foot.
That said, it looks like chafa isn't emitting a newline at all, so even with the tweaked cursor placement (always put it on the last row touched by the sixel), the image is sometimes cut off.
@PerBothner @christianparpart @wez I was hoping we could all agree on how to implement cursor placement after emitting a sixel. As far as I can tell, foot, DomTerm, Contour and Wezterm all place the cursor on the same row as the last sixel. But do you follow the DEC algorithm, and place it on the same row as the upper pixel of the last sixel, or do you place it on the last row touched by the sixel (i.e. the row containing the bottom pixel of the last sixel).
I know at least some of you have been following the discussions between @hackerb9 j4james, but I don't know what you ended up implementing. From an application point of view, I think it would be beneficial if we all implemented the same cursor placement algorithm...
Foot currently implements the DEC algorithm, but I think it would be easier for applications if I changed it to just place the cursor on the last row. Then, to print text under the sixel, you know all that's needed is (always) a single newline. Not one or two.
But, I think it's a bad idea to change foot if all other sixel terminals implement the DEC algorithm, and don't want to change.
I agree putting the cursor on the row containing the bottom sixel row makes more sense, and I can certainly change it if that the consensus. I prefer to match xterm.js for various reasons. https://github.com/jerch - what do you think?
@PerBothner Imho xterm.js currently keeps the text cursor at the row of the bottom-most pixel drawn from last sixel band. Means if the last band contains only "fiftel" (6th pixel never set), the 5th pixel would be the last one, not the sixth anymore. I did this to allow to print pictures in non 6-multiple px height and still properly align them at the bottom w'o nonsense excess row or excess space at the bottom. (There is still a bug attached to it, where empty sixel bands at the end might get truncated - https://github.com/jerch/node-sixel/issues/58)
I'm open to tweaking wezterm to be more sane, assuming that there are a couple of test cases with examples of where the cursor should end up.
FWIW, I think the current cursor placement in wezterm may well be a bit of a fluke arising from re-using the iterm2 image protocol logic that preceded it rather than a conscious effort to implement the vt340 algorithm.
wezterm's logic for this (shared by iterm2, kitty and sixel handling) can be found here: https://github.com/wez/wezterm/blob/22424c3280cb21af43317cb58ef7bc34a8cbcc91/term/src/terminalstate/image.rs#L65
the vertical position: https://github.com/wez/wezterm/blob/22424c3280cb21af43317cb58ef7bc34a8cbcc91/term/src/terminalstate/image.rs#L166-L170
the horizontal position: https://github.com/wez/wezterm/blob/22424c3280cb21af43317cb58ef7bc34a8cbcc91/term/src/terminalstate/image.rs#L233-L246
It may be a good idea to get @arakiken and the other mlterm developers on board too. I've been testing with it, since it had one of the first implementations, and is still one of the fastest. It currently (as of version 3.9.3) places the cursor on the row immediately after (that is, the first character row not touched by any sixel, transparent or not).
My main concerns as an application writer are a) consistency between terminals and b) simplicity of design. I'll happily support any consensus terminal developers arrive at.
Imho xterm.js currently keeps the text cursor at the row of the bottom-most pixel drawn from last sixel band. Means if the last band contains only "fiftel" (6th pixel never set), the 5th pixel would be the last one, not the sixth anymore.
I favored this approach at first, but it has the minor annoyance of deliberate image transparency being cut off. It also means applications must inspect the image data in order to know where the cursor'll end up, which is a slightly bigger problem. Correct me if I'm wrong and there's a way around this.
How about this:
For sixels without an explicit width/height (no raster attributes), assume all sixels are 6 pixels tall. I.e don't bother inspecting the image looking for transparency.
For sixels with an explicit width/height, use the specified height.
following along for notcurses, good to see this effort taking place
Sorry to intrude....
I just want to add that if it's possible to also consider the horizontal cursor position, it'd be really good (from the perspective of an application developer).
A unified vertical position is good enough for aligning images with text or other images vertically but not horizontally (i.e side-by-side).
Yes, it's probably possible to workaround this using absolute cursor positioning or save/restore but these are not always viable options, plus I believe the purpose of a consensus includes eliminating the need for workarounds in applications anyways.
Thank you all.
My understanding is the text cursor's horizontal position isn't changed at all. It only moves vertically.
Put another way, it is positioned "at the beginning of the sixel", i.e in the bottom left corner of the sixel.
@hpjansson related, but perhaps worth its own issue; chafa currently ends the sixel with a GNL ('-'). Is this intentional?
It adds an extra, empty, graphical row. I think it would be better to use a textual newline instead.
Fwiw, this behavior has a (very) minor performance impact on foot, for sixels with an explicit width/height, as we're forced to reallocate and enlarge the backing image buffer, and then initialize it to the background color. I'm not really bothered by it, but thought it might be worth mentioning at least.
Be happy to move this to a separate issue if you'd prefer that.
My understanding is the text cursor's horizontal position isn't changed at all. It only moves vertically. Put another way, it is positioned "at the beginning of the sixel", i.e in the bottom left corner of the sixel.
i went and looked at what we do in notcurses, and we do a hard cursor position after emission of any sixel. i imagine any application wanting to be portably correct will have to do the same thing, no? since they might be dealing with old terminals, or noncompliant ones, and it's not indicated via term queries? i don't want to disrupt unification, but from an app/toolkit author's perspective, i don't see how this helps...?
I agree, mostly. It's important we unify the vertical placement since it affects scrolling.
Horizontal placement isn't as important, except if a terminal places it after the image, in which case it might affect scrolling if it ends up being beyond the last column.
@hpjansson
I favored this approach at first, but it has the minor annoyance of deliberate image transparency being cut off. It also means applications must inspect the image data in order to know where the cursor'll end up, which is a slightly bigger problem. Correct me if I'm wrong and there's a way around this.
No you are right. This "bottom-most colored pixel" behavior cuts a fully transparent line of pixels at the bottom as not being part of the original image. If an image has that line intentionally, it will get stripped. Thats for level 1 sixel.
Correct me if I'm wrong and there's a way around this.
Well I put another warning into the docs not to use level 1 sixel on encoder side anymore, but to go with level 2 with explicit raster attributes denoting width and height extend. DEC STD 070 also tells us, that the graphics extends in raster attributes should never be exceeded by encoders, thus my decoder uses these to trim the graphics, which also solves the issue of non multiple-of-6 image heights in a more deterministic way. We already had several discussions about the worth of the sixel chapter in DEC STD 070 and how that deviates even from DEC's own machines. Imho DEC STD 070 is the only lengthy source from DEC, thats tries to sound normative, e.g. by implying certain limits on the sixel format, like height and width, or 256 color slots rule. Maybe they did that to get it in line with other industry standards of that time (I guess 256 colors support was at the top notch end of the 80s hardware caps), but it kinda never came into life as they soon stopped the whole sixel line.
@AnonymouX47
I just want to add that if it's possible to also consider the horizontal cursor position, it'd be really good (from the perspective of an application developer).
Thats not possible with sixel level 1, it has no width idea. Every sixel band can have different sixel cursor width to the right (an image might be ragged to the right) - which one to choose from? Sixel level 2 brings width&height with its raster attributes, so yes that could be used for a right border. To support both conformance levels - only the start cursor offset in a line is determined, which basically leads to the VT340 cursor mode.
Btw xterm.js also uses the VT340 cursor for IIP as the only supported cursor mode to level out image sequence differences. While it is more annoying to deal with that cursor mode as app dev, if you want to place text right of the image, its handling is always the same:
- know initial cursor pos, either by tracking in your own buffer state or do a explicit CPR
- place image of x*y pixels, either with sixel or IIP
- deduct image output size in cols;rows from TEs grid resolution (to get grid resolution of the TE, either do ioctl or CSI 14/18 t)
- move text cursor by image rows/cols up/right
- write your text
Foot currently allows "level 2" sixels to be extended, both vertically and horizontally. That is, the image will be resized, if necessary, to accommodate whatever the encoder is emitting.
I'd be more than happy to change it, to instead truncate the image to the width/height specified in the raster attributes. It'd just make everything simpler on the terminal side.
I'd be more than happy to change it, to instead truncate the image to the width/height specified in the raster attributes. It'd just make everything simpler on the terminal side.
Yepp, it reduces code complexity alot, and on perf side - it is actually ~40% faster during sixel decoding because of known upper bounds prehand in my decoder.
Thanks (@dnkl, @dankamongmen and @jerch) for the clarifications and suggestions. I guess I can work with those.
EDIT: ... as regards cursor horizontal placement.
@dnkl
related, but perhaps worth its own issue; chafa currently ends the sixel with a GNL ('-'). Is this intentional? It adds an extra, empty, graphical row. I think it would be better to use a textual newline instead. Fwiw, this behavior has a (very) minor performance impact on foot, for sixels with an explicit width/height, as we're forced to reallocate and enlarge the backing image buffer, and then initialize it to the background color. I'm not really bothered by it, but thought it might be worth mentioning at least. Be happy to move this to a separate issue if you'd prefer that.
I don't remember exactly how intentional it was, but when I wrote most of the encoder back in 2018 I had to work around issues in existing decoders. For instance, I specify the raster dimensions but still make sure to pad every sixel row to the full width, since I noticed a case where the terminal would have garbage in the image buffer otherwise. It's possible the GNL was required by a decoder at some point.
That said, after testing it again now, it seems e.g. mlterm behaves the same with and without the GNL; I think it opens a new sixel row only when its pixel data starts arriving. I don't know of anything that needs the final GNL anymore, so I'll remove it.
I'm also partial to the idea that raster attributes should preempt dynamic resizing. It makes things more predictable for everyone.
I'm glad to see all the terminal developers here working together!
If I can summarize, it sounds like everyone is in agreement that modern terminals should allow what I will refer to as splat-nl-print: Applications may send sixels to a screen and simply send a newline before any text if they do not wish to overlap the graphics. Although VT340 compatibility is not the highest priority, I can add that my tests show splat-nl-print as the algorithm of choice even on a real VT340 as the occasional glitch is vanishingly rare in actual usage.
Additional points brought up:
-
Should the width and height specified in the Raster Attributes (RA) be used as a clipping box despite DEC's documentation explicitly stating RA does not limit the size of the image? Personally, I think, "Yes". It is a reasonable optimization for modern terminals when there is exactly one RA present in the sixel data stream. However, I would also hope modern terminals would be robust enough to fall back to unoptimized rendering when necessary — for example, no RA in the image, multiple RAs, an RA with zero width / height, or data where the program doesn't know the size ahead of time. (Sidenote: I do not expect any modern emulator to be able to handle @jerch's endless scrolling sixels.)
-
Should the text after a new line overwrite transparent pixels at the bottom of the graphic? I believe so unless the RA width and height specify otherwise.
-
Should Graphic New Line scroll the screen immediately before pixel data is received? Yes, I think that is correct. And applications encoding sixels should not output a final Graphic New Line at the end of the stream.
-
How can positioning of text to the right of a sixel image be made easier for application developers? I agree that a new issue should be created to discuss this. (If someone does, please @ me in the discussion as I'm curious about possible solutions.)
If you're going to define your own version of Sixel, can you please make it something that apps can opt into or out of with a mode. Worst case, if you don't want to implement both standard and non-standard cursor placement, you could still report the mode as permanently set, and then apps can at least tell what behavior to expect from the terminal.
Have you tested recent versions of xterm? I think it is desirable to be compatible with xterm. It may be a good idea to contact Thomas E. Dickey, the maintainer of xterm. He has tweaked the handling of Sixels in the past, and may be open to (if necessary) doing so to match the "saner" behavior.
@j4james I don't believe there is a "standard version" of Sixel. That is part of the problem: Different implementations act differently. Is "standard Sixel" whatever DEC implemented in their terminals? Are all such terminals consistent? What about the specifications (manuals) from DEC? What about corner cases not convered in the manuals? What about xterm - and which version of xterm? If all of these were consistent, I'd consider that as "standard sixel" - but I'm pretty certain that is not the case,
Have you tested recent versions of xterm? I think it is desirable to be compatible with xterm. It may be a good idea to contact Thomas E. Dickey, the maintainer of xterm. He has tweaked the handling of Sixels in the past, and may be open to (if necessary) doing so to match the "saner" behavior.
UPDATE I have determined that I was mistaken about Xterm's behavior regressing. In fact, it is now almost precisely correct. The one thing it is missing, however, is moving the text cursor down on Graphic New Lines, which just happens to be the default output from ImageMagick's convert tool. @ThomasDickey.
Here is a new script, textcursor2.sh, which shows how a TEXT NEW LINE (or, equivalently, CURSOR DOWN) separates a sixel image from any following text on a VT340 with its 20 pixel high character cell.
It also shows what happens when GRAPHIC NEW LINE is used; the most important feature of which seems to be that it acts exactly like a single text new line whenever the image height is a multiple of the character cell height.
Should the text after a new line overwrite transparent pixels at the bottom of the graphic? I believe so unless the RA width and height specify otherwise.
Not sure I agree this is a good idea. Is that what the VT340 does? Or is there another reason for doing it this way?
If we choose to truncate images with raster attributes set, I don't really see the need for special casing the last line of transparent sixel bands - the encoder can simply emit a sixel with raster attributes to get an exact sized image.
Yes, that is the way the VT340 does it, or tries to, anyhow. To not do it that way would cause peculiar artifacts.
Consider a terminal with a text height of 20 pixels. Two images that end on the same line ought to have following text that is also on the same line. Place a 60px high rectangle at row 0 and a 40 px high rectangle at row 1 (the 20th pixel). If the terminal does not pay attention to transparent bands at the bottom, it would display the text after the 40px rectangle a line lower than the 60px rectangle.
#!/bin/bash
DCS=$'\eP'; ST=$'\e\\'; DOWN=$(tput cud 1)
clear
echo -n ${DCS}q'"1;1;10;60#1!10~-!10~-!10~-!10~-!10~-!10~-!10~-!10~-!10~-!10~'${ST}
echo "${DOWN}This is 60px high (starting at 0)"
tput cup 1 40
echo -n ${DCS}q'"1;1;10;40#2!10~-!10~-!10~-!10~-!10~-!10~-!10N'${ST}
echo "${DOWN}This is 40px high (starting at 20)"
Xterm works the same as the VT340 and, while it took some time to get it right, hopefully that means it'll be easier for other terminals which can use it as a reference implementation.
I completely understand not wanting to special case the last band of transparent sixels. It seems yucky to me, too. I believe DEC engineers had the same thoughts and went to the extreme of inventing a wacky heuristic that works swiftly (if not perfectly) by presuming character cells are exactly 20 pixels high. That algorithm does not work in general, so we may have to just OR together every sixel in the last row to see what the lowest opaque pixel actually was. I believe technology has advanced sufficiently since 1988 that it should be possible.
If ORing a few thousands bytes takes too long one could write CUDA code to do it in parallel on the GPU. :laughing: .
Using Raster Attributes to truncate images seems a reasonable optimization for current hardware, but it is not ideal. And, since Sixel images aren't required to set the geometry, it is not always applicable.
Looking at Xterm's source code reminded me of a minor point: while a text newline ('\n') is the preferred separator between an image and text, there are other options.
-
Index (EscD) after a sixel image behaves the same as a newline except that the column is not reset to zero.
-
Cursor down (Esc[B) after a sixel image behaves the same as Index in that it keeps the column position, however Down differs in that it will not cause the terminal to scroll when it hits the bottom of the screen.
-
Graphic Newline (a - before the DCS String Terminator Esc\) is not as useful, but it is commonly seen in encoders, such as ImageMagick's convert. One nice feature of DEC's algorithm for handling Graphic Newlines on the VT340 is that when an image is a multiple of 20 pixels high (the line height), a single GNL has the exact same effect as a single text newline. That is, the text that follows is put directly underneath the image with no gap and no overlap.
-
Nothing is also a possibility, but I do not know of what utility it is, if any. DEC clearly had some idea, though, as its behavior is not as simple as "always overwrite from the top left".
An image from 1 to 24 pixels high is overwritten from the top of the graphic by 20 px high text. But an image of 25 pixels moves the cursor down a row (20 px) and overwrites the bottom five pixels of the image with text.
Again, this is minor. Reverse engineering and implementing the "correct" behavior of any of these is low-priority compared to getting text newlines working the same everywhere.
UPDATE: Thank you to @j4james for pointing out that "newline" is ambiguous. I mean the C '\n' character which moves the cursor down and to the leftmost column. UNIX systems encode newline as ASCII LF, 0x0a, but other operating systems may be different.
UPDATE: Another thank you to j4james for correcting me that Down does not scroll the screen and that one must use Index. I have updated the information in the list above.
Thanks for clarifying the issue with trailing transparent rows.
I actually did it that way in earlier versions of foot, simply because XTerm did. I changed it when I couldn't find any evidence that's what should happen.
I guess I'll add it back :)
With that, I agree with everything in your list of suggested behaviors.