Fixes #1265 - Adds Sixel rendering support
Sixel
See also this discussion: #1265
Sixel is a graphics rendering protocol for terminals. It involves converting a bitmap of pixels into an ASCII sequence.
Sixel is the most widely supported graphic protocol in terminals such as xterm, mlterm, windows terminal (support still in pre-release - see v1.22.2362.0) etc. To see what terminals support sixel you can go to: https://www.arewesixelyet.com/
Quantization
Sixel involves a palette of limited colors (typically 256). This means we need to do the following:
- Create a palette of 256 colors from an input bitmap
- Map colors during rendering to the palette colors
The class that handles this is ColorQuantizer. For extensibility I have created interfaces for the actual algorithms you might want to use/write. These are IPaletteBuilder and IColorDistance.
Color Distance
Color distance determines how closely two colors resemble one another. I included the fastest algorithm in Terminal.Gui and left the other one in UICatalog
| Distance Metric | Speed | Description |
|---|---|---|
| Euclidean | Fast | Measures the straight-line distance between two colors in the RGB color space. |
| CIE76 | Slow | Computes the perceptual difference between two colors based on the CIELAB color space, aligning more closely with human vision. |
Palette Building
Deciding which 256 colors to use to represent any image is interesting problem. I have added a simple fast algorithm, PopularityPaletteWithThreshold. It sums all the pixels and merges those that are similar to one another (based on IColorDistance) then takes the most common colors.
I have also included in UICatalog a more standard algorithm that is substantially slower called MedianCutPaletteBuilder. This creates a color palette by recursively dividing a list of colors into smaller subsets ("cubes") based on the largest color variance, using a configurable color distance metric IColorDistance.
Encoding
Once the palette is built we need to encode the pixel data into sixel.
Sixel encoding involves converting bitmap image pixels 6 rows at a time (called a 'band'). Each band of vertical pixels will be converted into X ASCII characters where X is the number of colors in the band:
Setup the palette
First we need to describe all the colors we are using. This takes the format:
#0;2;100;0;0
The first number is the palette index i.e. 0 (first color). Next is the color space e.g. RGB/HSL - 2 is RGB. Then comes the RGB components. These are 0-100 not 0-255 as might be expected.
Pick the drawing color
The next step is to specify the first color you painting with, e.g. this orange:
We pick the color by outputting the index in the palette e.g. #1 (pick color 1)
Then we look at the current band and see what pixels match that color, for example the the second and third pixels in this case (highlighted in pink below):
You convert this into a 6 digit binary array (bitmask)
A sixel is a column of 6 pixels - with a width of 1 pixel
Column controlled by one sixel character:
[0] - Bit 0 (top-most pixel)
[1] - Bit 1
[1] - Bit 2
[0] - Bit 3
[0] - Bit 4
[0] - Bit 5 (bottom-most pixel)
This is binary 011000 is then converted to an int and add 63. That number will be a valid ASCII character - which is what will be added to the final output.
For example the bitmask is 011000 is 24 in decimal. Adding 63 (the base-64 offset) gives 24 + 63 = 87, which corresponds to the ASCII character W (upper case w).
0x3Fin hexadecimal corresponds to 63 in decimal, which is the ASCII value for the character?[...] sixels use a base-64 encoding to represent pixel data.
After encoding the whole line in this way you have to 'rewind' to the start of the line and do the next color. This is done with $ and picking another color (e.g. #2 - select color 2 of the palette). You only have to paint the colors that appear in the band (not all the colors of the image/palette).
Finally after drawing all the 'color layers' you move to the next band (i.e. down) using - (typically also with $ to go to the start of the row).
Run length encoding
If you have the same ASCII character over and over you can use exclamation then the number of repeats e.g. !99 means repeat 99 times. Then the thing to repeat e.g.
!12~ is the same as ~~~~~~~~~~~~.
Transparency
You can achieve transparency by not rendering in a given area. For example if every 'color layer' has a 0 for a given entry in the bitmask it will not be drawn.
This requires setting the background draw mode in the header to 1
\u001bP0;1;0
instead of
\u001bP0;0;0
Not all terminals support transparent sixels.
Determining Support
It is possible to determine sixel support at runtime by sending ANSII escape sequences. We don't currently support these sequences although @BDisp is working on it here: https://github.com/gui-cs/Terminal.Gui/pull/3768
So I have added the ISixelSupportDetector interface and SixelSupportResult (includes whether transparency is supported, what the resolution is etc).
I've also left the ANSI escape sequences detector in but commented out. Let me know if you want that removed.
Fixes
- Fixes #1265
Proposed Changes/Todos
- Adds Sixel rendering support
Pull Request checklist:
- [x] I've named my PR in the form of "Fixes #issue. Terse description."
- [x] My code follows the style guidelines of Terminal.Gui - if you use Visual Studio, hit
CTRL-K-Dto automatically reformat your files before committing. - [x] My code follows the Terminal.Gui library design guidelines
- [x] I ran
dotnet testbefore commit - [x] I have made corresponding changes to the API documentation (using
///style comments) - [ ] My changes generate no new warnings
- [x] I have checked my code and corrected any poor grammar or misspellings
- [x] I conducted basic QA to assure all features are working
Hey I had a thought to bounce off of ya:
This seems like something that would be particularly well suited as a pluggable component, in whatever form that may take, be it a satellite assembly/module, dynamically loaded libraries in a configurable path, straight c# code compiled at run-time, or whatever.
In any of them, the glue would be essentially the same - an interface any plugin must implement, with them then being free to do whatever they like beyond that in their own code.
Any thoughts on that?
The work I'm doing on the drivers will make that kind of thing a lot easier for us to provide, since I'm pulling out interfaces for the public API.
Could be an option 🤔. At the moment I am still in the exploration phase.
The driver level bit is basically
- move to x,y
- output sixel
Currently I'm doing this every render pass of driver which is very slow. But I'm not sure how much of that is down to the pixel encoded instructions being unoptimised.
There are 3 areas I'm working on at the moment
- better color palette picking
- understanding and optimising sixel pixel data encoding algorithm
- integrating with driver
If outputting frames is just inherently slow then some kind of 'reserve area' method might be required to allow a single render to persist through multiple redrawing of main ui
But for now I think it is too early to think about plugin - it needs to work first!
Also down the line it might be nice to do more with GraphView e.g. output sixel if available or fallback to existing ASCII
Looking good but for some reason the colors are off, specifically the dark colors. I thought at first it was the pixel encoding that was redrawing over itself with wrong colors or the palette was not having the dark colors or something but after completely replacing the pixel encoding bit I'm pretty sure that is now correct.
Maybe I can improve situation with some more buttons in scenario e.g. to view the palette used.
Test can be run on a sixel compatible terminal with
dotnet run -- Images -d NetDriver
Image encoding (one off cost) is slow, image rendering is relatively fast (but done every time you redraw screen).
I've included a few algorithms because I thought color issue was bad palette generation or bad color mapping. Might scale it back a bit or provide 1 fast implementation in core and the slow ones in UICatalog as examples.
Looking at this its also possible the color structs are off somewhere such that RGB is interpreted as ARGB and so the blue element is missing or something.
Also haven't explored dithering yet. Which seems to be another big area of sixel image synthesis.
Success, bug was indeed just creating the image wrong at the start
Literally the first step in image generation and all because in TG the A is on the right instead of left of the arguments ><.
public static Color [,] ConvertToColorArray (Image<Rgba32> image)
{
int width = image.Width;
int height = image.Height;
Color [,] colors = new Color [width, height];
// Loop through each pixel and convert Rgba32 to Terminal.Gui color
for (int x = 0; x < width; x++)
{
for (int y = 0; y < height; y++)
{
var pixel = image [x, y];
- colors [x, y] = new Color (pixel.A, pixel.R, pixel.G, pixel.B);
+ colors [x, y] = new Color (pixel.R, pixel.G, pixel.B); // Convert Rgba32 to Terminal.Gui color
}
}
return colors;
}
Have you tested out how it behaves if you've altered your terminal color settings/environment variables? Like...do you get predictably ruined colors, if in an indexed color mode, or does it do its best to try to force "correct" colors?
My assumption would be that the output will depend on color depth, with only ANSI or other indexed color schemes being subject to any silliness from that, and consoles capable of true color looking right no matter how ugly one's terminal color scheme may be. But that's just conjecture based on how I'd expect other things to work in most terminals without monkey business going on under the hood. 🤷♂️
Success, bug was indeed just creating the image wrong at the start
Literally the first step in image generation and all because in TG the A is on the right instead of left of the arguments ><.
Color can be constructed as ARGB or RGBA. Just pass it the bytes as an int or uint, or if sixel exposes the raw value as an RGBA or ARGB value, use that directly for the constructor.
If it does not expose the whole 32-bit value and you can only get to the bytes, here is each way of doing it in one line and all on the stack:
// The int constructor is RGBA
new Color (BitConverter.ToInt32 ([pixel.R, pixel.G, pixel.B, pixel.A]));
// The uint constructor is ARGB
new Color (BitConverter.ToUInt32 ([pixel.A,pixel.R, pixel.G, pixel.B]));
I could add a direct implicit cast if you like, to make life easier while using it. 🤷♂️
IIRC, the uint vs int decision was based on the same or very similar design with System.Drawing.Color, for consistency.
Color can be constructed as ARGB or RGBA
Yup, wasn't meaning that there was a problem with the constructor param order just that I made mistake right at the start and kept thinking issue was with palette generation.
sixel exposes the raw value as an RGBA or ARGB value
Sixel exposes RGB only (no A) and it is on a scale of 0-100. You can define up to x colors (typically 256) and those can be any RGB values you like.
For example
#0;2;100;0;0
The # indicates that we are declaring a color. The 0 is the index in the palette we are setting (i.e. the first color). The 2 indicates Type (RGB) - its basically always going to be a 2. Then you have RGB as 0-100 scaled.
So the above declares the color red (255,0,0) as palette entry 0.
The above text string is the the pure ASCII that you would output to the console when redering the sixel.
You use the palette colors when you encode the pixel data. Rendering pixels involves selecting a color index (from palette) then filling along the band (6 pixels high) with it. Then either 'rewinding' to start of band and drawing with a different color or moving to next band.
Have you tested out how it behaves if you've altered your terminal color settings/environment variables?
At this stage I am making something that works and then writing tests and documenting. I have tested in Windows Terminal Preview (the one that supports sixel) and ML Term (on linux).
There are some terminals that support limited palette sixel (e.g. 16 colors instead of 256). But I think generally if a terminal supports sixel it probably supports true color too.
Once it is done it can be tested for compatibility under corner cases like color setting changes. I think compatibility will have to be left to the user i.e. user can set a config value to support sixel rather than trying to dynamically detect it based on environment vars etc.
Interesting.
As for the color values for conversion purposes, I'd suggest an extension method on Sixel colors in the spirit/convention of the common ToBlahBlah or FromBlahBlah methods color types often have. If there's value to you in doing that, of course.
I wouldn't suggest any modifications to Color that directly depend on any types from Sixel, for sake of separation and not building in too deep a dependency, though. 🤷♂️
This is starting to come together.
I now have a pretty firm understanding of sixel encoding and can write tests explaining the expected encoded pixel data.
TODO:
- more tests
- screen positioning
- sixel to screen coordinates measuring
- potentially some resizing logic in UICatalog (I.e. to show how you ensure sixel doesn't spill out of a View)
- refactor and finalise the 'out of the box' algorithms and move rest to UICatalog
So far I have not really touched the console drivers. Only hooking in via static to NetDriver - where outputting sixel encoded image is 2 lines of code (console move then console write).
Probably need some guidance on how best to implement in drivers once I've done the above
Looks like you can detect sixel support by outputting the terminal querying code:
"\u001B[c";
If the VT220-level reply contains a '4', sixel graphics are supported.
This will respond like this:
[?61;4;6;7;14;21;22;23;24;28;32;42c
Does support sixel (latest Windows Terminal pre release)
For one that does not support sixel you will see:
[?61;6;7;21;22;23;24;28;32;42c
Does not support sixel (regular windows terminal)
@tig / @dodexahedron / @BDisp do we make use of this console querying system? (send escape code, read response). Is there anywhere in the code that does this kind of thing I could tap into?
In the NetDriver I used in the Init method a line to read the terminal type. Actually I don't know at the moment.
Ok nice looks like NetDriver is already all set up for sending and getting responses.
There is a Queue and then in the handler code it looks for the terminator to match up with outstanding requests.
Only odd thing is that it seems tied to mouse handling. I guess that was the first use case.
What I have implemented works (see image) but I think I should test on other consoles plus also we could look at how best to design this.
Basically you send
<esc>[c
See Reports - Device Attributes (DA)
And then you get the response elements and if you see a "4" anywhere in that array then it means sixel support.
<esc>[?61;4;6;7;14;21;22;23;24;28;32;42c
example response
@tig what is the future goal of NetDriver vs WindowsDriver? Are we planning on ditching ncurses driver? I'd rather avoid implementing too much into a driver that is not getting worked on and/or implementing the same thing twice/three times i.e. in each driver.
Changes are in db0fc41
Requested ANSI escape sequences shouldn't be enqueued because we need his response immediately before return from the method that did the request. So, it must be rewrite to provide a response like this var response = ExecuteAnsiRequest ("\u001B[c"); which will return ("\u001b[?61;6;7;14;21;22;23;24;28;32;42c", ""), which is a tuple that return the output/error. If error isn't empty then his information will be provided.
escape sequences shouldn't be enqueued because we need his response immediately [...] must be rewrite to provide a response like this [blocking function]
These request/response calls seem to be inherently async. The only way to block waiting the response would be to Thread Sleep or something which seems hacky.
That said we could do something like
/// <summary>
/// Describes an ongoing ANSI request sent to the console.
/// Use <see cref="ResponseRecieved"/> to handle the response
/// when console answers the request.
/// </summary>
public class AnsiRequest
{
/// <summary>
/// Code to send e.g. see <see cref="EscSeqUtils.CSI_Device_Attributes_Request"/>
/// </summary>
public string Code { get; init; }
/// <summary>
/// <para>
/// The terminator that uniquely identifies the type of response as responded
/// by the console. e.g. for <see cref="EscSeqUtils.CSI_Device_Attributes_Request"/>
/// the terminator is <see cref="EscSeqUtils.CSI_Device_Attributes_Request_Terminator"/>.
/// </para>
/// <para>
/// After sending a request, the first response with matching terminator will be matched
/// to the oldest outstanding request.
/// </para>
/// </summary>
public string Terminator { get; init; }
/// <summary>
/// Invoked when the console responds with an ANSII response code that mathces the
/// <see cref="Terminator"/>
/// </summary>
public event EventHandler<string []> ResponseRecieved;
}
I wouldn't do it async because it you request the cursor position for e.g. you may want the response right away to take some action while you are in the UI thread.
But what if the console doesn't respond for some reason or doesn't support the command and ignores it? It would mean timeouts or other stuff and get complicated.
Not exactly if we handle the ReadKey in the request method and ensuring returning the response or error in it.
There is a delay between sending the code and the terminal responding.
Furthermore rhere is only 1 console input stream and that is already being handled in ProcessInputQueue (in NetDriver) - Specifically in this case in HandleRequestResponseEvent .
We could easily get events between us sending the ANSI code and us reading the response (e.g. mouse events, user keypresses etc).
The current code is already essentially async - the write stream code is decoupled from the read stream code. It loops the input and handles accordingly (mouse etc). Then when it sees an ANSI response it looks to see if there is an outstanding request (with same terminator) and 'dequeues' it.
I think this current code implementation is correct. I was able to add my new ANSI request with very few lines of code.
The solution for API user friendliness is either to try and force the process to be blocking as you say or embrace the current system and use a callback for when answer is seen. Blocking is problematic because of:
- What if the console does not respond for some reason?
- We have to buffer all intermediate chars that are not what we are waiting for
- We have to write a second parallel input processing function when there is already a perfectly good one in place
I don't have a ton of time RN to comment but...
See:
- #2803
At the core of EscSeqUtil is code that tries to provide, effectively, a message-based, protocol handler for ANSI ESC sequences.
The current code does it in a brute force way. It works, but barely. NetDriver uses it for querying the terminal, and we will need to use it (or a better replacement) to both make Windows Driver use ANSI sequnces direclty for everything and to replace CursesDriver.
In a defunct PR (#2940), I started down this path but abandoned it because of other priorities.
See this issue for the current master issue for all this stuff:
- #2610
@tig what is the future goal of NetDriver vs WindowsDriver? Are we planning on ditching ncurses driver? I'd rather avoid implementing too much into a driver that is not getting worked on and/or implementing the same thing twice/three times i.e. in each driver.
Yes, my vision is that curses driver goes away. We have one driver based on the ANSI ESC Sequence "protocol" that replaces both WindowsDriver and CursesDriver. NetDriver lives on as a lowest-common-denominator. Note, the WT team refers to this protocol as "Virtual Terminal Sequences" which I find confusing.
As noted above, a key piece of work required is a robust engine for handling the ANSI esc sequence protocol, which is a bi-directional protocol, in a way that enables BOTH synchronous and async use-cases.
You'd think such a thing already existed, but I've not found a good .NET library that does it.
What we should NOT do is try to build it from the ground up. The patterns are well-understood, standard message-based protocol patterns. We should choose an existing MIT license protocol library that we build on. The ANSI spec (and implementations) are not pure and have weird edge-cases given the long history, so we'll have a layer that will look a bit ugly.
@dodexahedron has recently done work on the drivers for #3692 which I have not looked closely at, but am assuming paves the way for all this in a good way.
As noted above, a key piece of work required is a robust engine for handling the ANSI esc sequence protocol
This looks pretty good: https://github.com/darrenstarr/VtNetCore
Seems to handle escape code processing, key mapping etc. Have to find some time to clone it and give it a spin.
@tznind please try with the PR #3768 and let me know what do you think. You can use it on all drivers to check if sixel is supported at least. I know you don't need immediate response but my solution can be used in all drivers and so, at request level, can be used. Thanks.
Please @tznind with tis PR #3770 can you test Sixel in the CursesDriver. Thanks.
Ok heres the stress test for this feature! I think this is a nice use case to try and optimise for.
Its the doom fire algorithm.
Goal
Heres what it should look like (dropping all the flickering frames):
Reality
Heres what it looks like at the moment in NetDriver (also it maxes out a CPU core):
Issue is compounded by the NetDriver redraw bug, see #3761
That's great. Is the first video using WindowsDriver?
That's great. Is the first video using
WindowsDriver?
I wish!
Sadly not though. I just used drop frames on the recorded gif to speed it up and loose the flicker.
The top video is where I'd like to see the performance at if we can.
At the moment only NetDriver is working for sixel but I'm working on it.
Windows driver may turn out to be faster. There's potentially also scope for multi threading in the sixel encoder. But first I need to understand where bottlenecks are in general - hence this test.
Regarding #3761 ,
Both WindowsDriver and NetDriver are both currently frantically repaint constantly - even when there is no change to what they are printing. This results in flickering if you add the sixel data in as it fights with the rune level painting:
WindowsDriver constantly repaints itself for no reason, fighting with the sixel drawing and causing flicker
I investigated what would happen if I added in a simple performance algorithm for the WriteToConsole method:
// TODO: presume something like this used to exist but broke?
// If console output has changed
if (s != _lastWrite)
{
// supply console with the new content
result = WriteConsole (_screenBuffer, s, (uint)s.Length, out uint _, nint.Zero);
}
_lastWrite = s;
In this case the screen no longer repaints itself constantly, only when stuff has actually changed.
However the sixel layer does still repaint, but it does so 'additively' with regards to transparency.
So sixels can be drawn but do not disapear until the console refreshes that area - a good thing. It means we can't do animation + transparent pixels or animations that move across screen. But we can do static animations or moving animations that flicker horribly.
WindowsDriver doomfire with screen update loop programmed to only write to console when ASCII content has changed for the screen
Can you please try in the CursesDriver using my PR #3770? You can the same code as you used for the NetDriver.
Can you please try in the
CursesDriverusing my PR #3770? You can the same code as you used for theNetDriver.
Sure, I think I should create a seperate branch if I'm merging the curses driver and the standalone escape sequences PRs into the sixel work.
I don't want a monster PR especially incase those other 2 get further updates.