Terminal.Gui icon indicating copy to clipboard operation
Terminal.Gui copied to clipboard

Fixes #1265 - Adds Sixel rendering support

Open tznind opened this issue 1 year ago • 38 comments

Sixel

sixel-final

See also this discussion: #1265

Sixel is a graphics rendering protocol for terminals. It involves converting a bitmap of pixels into an ASCII sequence.

Sixel is the most widely supported graphic protocol in terminals such as xterm, mlterm, windows terminal (support still in pre-release - see v1.22.2362.0) etc. To see what terminals support sixel you can go to: https://www.arewesixelyet.com/

Quantization

Sixel involves a palette of limited colors (typically 256). This means we need to do the following:

  • Create a palette of 256 colors from an input bitmap
  • Map colors during rendering to the palette colors

The class that handles this is ColorQuantizer. For extensibility I have created interfaces for the actual algorithms you might want to use/write. These are IPaletteBuilder and IColorDistance.

Color Distance

Color distance determines how closely two colors resemble one another. I included the fastest algorithm in Terminal.Gui and left the other one in UICatalog

Distance Metric Speed Description
Euclidean Fast Measures the straight-line distance between two colors in the RGB color space.
CIE76 Slow Computes the perceptual difference between two colors based on the CIELAB color space, aligning more closely with human vision.

Palette Building

Deciding which 256 colors to use to represent any image is interesting problem. I have added a simple fast algorithm, PopularityPaletteWithThreshold. It sums all the pixels and merges those that are similar to one another (based on IColorDistance) then takes the most common colors.

I have also included in UICatalog a more standard algorithm that is substantially slower called MedianCutPaletteBuilder. This creates a color palette by recursively dividing a list of colors into smaller subsets ("cubes") based on the largest color variance, using a configurable color distance metric IColorDistance.

image

Encoding

Once the palette is built we need to encode the pixel data into sixel.

Sixel encoding involves converting bitmap image pixels 6 rows at a time (called a 'band'). Each band of vertical pixels will be converted into X ASCII characters where X is the number of colors in the band:

image

Setup the palette

First we need to describe all the colors we are using. This takes the format: #0;2;100;0;0

The first number is the palette index i.e. 0 (first color). Next is the color space e.g. RGB/HSL - 2 is RGB. Then comes the RGB components. These are 0-100 not 0-255 as might be expected.

Pick the drawing color

The next step is to specify the first color you painting with, e.g. this orange:

image

We pick the color by outputting the index in the palette e.g. #1 (pick color 1)

Then we look at the current band and see what pixels match that color, for example the the second and third pixels in this case (highlighted in pink below):

image

You convert this into a 6 digit binary array (bitmask)

       A sixel is a column of 6 pixels - with a width of 1 pixel

    Column controlled by one sixel character:
      [0]  - Bit 0 (top-most pixel)
      [1]  - Bit 1
      [1]  - Bit 2
      [0]  - Bit 3
      [0]  - Bit 4
      [0]  - Bit 5 (bottom-most pixel)

This is binary 011000 is then converted to an int and add 63. That number will be a valid ASCII character - which is what will be added to the final output.

For example the bitmask is 011000 is 24 in decimal. Adding 63 (the base-64 offset) gives 24 + 63 = 87, which corresponds to the ASCII character W (upper case w).

0x3F in hexadecimal corresponds to 63 in decimal, which is the ASCII value for the character ? [...] sixels use a base-64 encoding to represent pixel data.

After encoding the whole line in this way you have to 'rewind' to the start of the line and do the next color. This is done with $ and picking another color (e.g. #2 - select color 2 of the palette). You only have to paint the colors that appear in the band (not all the colors of the image/palette).

Finally after drawing all the 'color layers' you move to the next band (i.e. down) using - (typically also with $ to go to the start of the row).

Run length encoding

If you have the same ASCII character over and over you can use exclamation then the number of repeats e.g. !99 means repeat 99 times. Then the thing to repeat e.g. !12~ is the same as ~~~~~~~~~~~~.

Transparency

You can achieve transparency by not rendering in a given area. For example if every 'color layer' has a 0 for a given entry in the bitmask it will not be drawn.

This requires setting the background draw mode in the header to 1

\u001bP0;1;0 instead of \u001bP0;0;0

Not all terminals support transparent sixels.

Determining Support

It is possible to determine sixel support at runtime by sending ANSII escape sequences. We don't currently support these sequences although @BDisp is working on it here: https://github.com/gui-cs/Terminal.Gui/pull/3768

So I have added the ISixelSupportDetector interface and SixelSupportResult (includes whether transparency is supported, what the resolution is etc).

I've also left the ANSI escape sequences detector in but commented out. Let me know if you want that removed.

Fixes

  • Fixes #1265

Proposed Changes/Todos

  • Adds Sixel rendering support

Pull Request checklist:

  • [x] I've named my PR in the form of "Fixes #issue. Terse description."
  • [x] My code follows the style guidelines of Terminal.Gui - if you use Visual Studio, hit CTRL-K-D to automatically reformat your files before committing.
  • [x] My code follows the Terminal.Gui library design guidelines
  • [x] I ran dotnet test before commit
  • [x] I have made corresponding changes to the API documentation (using /// style comments)
  • [ ] My changes generate no new warnings
  • [x] I have checked my code and corrected any poor grammar or misspellings
  • [x] I conducted basic QA to assure all features are working

tznind avatar Sep 09 '24 22:09 tznind

Hey I had a thought to bounce off of ya:

This seems like something that would be particularly well suited as a pluggable component, in whatever form that may take, be it a satellite assembly/module, dynamically loaded libraries in a configurable path, straight c# code compiled at run-time, or whatever.

In any of them, the glue would be essentially the same - an interface any plugin must implement, with them then being free to do whatever they like beyond that in their own code.

Any thoughts on that?

The work I'm doing on the drivers will make that kind of thing a lot easier for us to provide, since I'm pulling out interfaces for the public API.

dodexahedron avatar Sep 11 '24 23:09 dodexahedron

Could be an option 🤔. At the moment I am still in the exploration phase.

The driver level bit is basically

  • move to x,y
  • output sixel

Currently I'm doing this every render pass of driver which is very slow. But I'm not sure how much of that is down to the pixel encoded instructions being unoptimised.

There are 3 areas I'm working on at the moment

  • better color palette picking
  • understanding and optimising sixel pixel data encoding algorithm
  • integrating with driver

If outputting frames is just inherently slow then some kind of 'reserve area' method might be required to allow a single render to persist through multiple redrawing of main ui

But for now I think it is too early to think about plugin - it needs to work first!

Also down the line it might be nice to do more with GraphView e.g. output sixel if available or fallback to existing ASCII

tznind avatar Sep 12 '24 02:09 tznind

Looking good but for some reason the colors are off, specifically the dark colors. I thought at first it was the pixel encoding that was redrawing over itself with wrong colors or the palette was not having the dark colors or something but after completely replacing the pixel encoding bit I'm pretty sure that is now correct.

Maybe I can improve situation with some more buttons in scenario e.g. to view the palette used.

Test can be run on a sixel compatible terminal with

dotnet run -- Images -d NetDriver

Image encoding (one off cost) is slow, image rendering is relatively fast (but done every time you redraw screen).

I've included a few algorithms because I thought color issue was bad palette generation or bad color mapping. Might scale it back a bit or provide 1 fast implementation in core and the slow ones in UICatalog as examples.

Looking at this its also possible the color structs are off somewhere such that RGB is interpreted as ARGB and so the blue element is missing or something.

Also haven't explored dithering yet. Which seems to be another big area of sixel image synthesis.

shot-2024-09-14_05-12-41

tznind avatar Sep 14 '24 04:09 tznind

Success, bug was indeed just creating the image wrong at the start

Literally the first step in image generation and all because in TG the A is on the right instead of left of the arguments ><.

public static Color [,] ConvertToColorArray (Image<Rgba32> image)
{
    int width = image.Width;
    int height = image.Height;
    Color [,] colors = new Color [width, height];

    // Loop through each pixel and convert Rgba32 to Terminal.Gui color
    for (int x = 0; x < width; x++)
    {
        for (int y = 0; y < height; y++)
        {
            var pixel = image [x, y];
-            colors [x, y] = new Color (pixel.A, pixel.R, pixel.G, pixel.B); 
+            colors [x, y] = new Color (pixel.R, pixel.G, pixel.B); // Convert Rgba32 to Terminal.Gui color
        }
    }

    return colors;
}

image

tznind avatar Sep 14 '24 15:09 tznind

Have you tested out how it behaves if you've altered your terminal color settings/environment variables? Like...do you get predictably ruined colors, if in an indexed color mode, or does it do its best to try to force "correct" colors?

My assumption would be that the output will depend on color depth, with only ANSI or other indexed color schemes being subject to any silliness from that, and consoles capable of true color looking right no matter how ugly one's terminal color scheme may be. But that's just conjecture based on how I'd expect other things to work in most terminals without monkey business going on under the hood. 🤷‍♂️

dodexahedron avatar Sep 16 '24 05:09 dodexahedron

Success, bug was indeed just creating the image wrong at the start

Literally the first step in image generation and all because in TG the A is on the right instead of left of the arguments ><.

Color can be constructed as ARGB or RGBA. Just pass it the bytes as an int or uint, or if sixel exposes the raw value as an RGBA or ARGB value, use that directly for the constructor.

If it does not expose the whole 32-bit value and you can only get to the bytes, here is each way of doing it in one line and all on the stack:

// The int constructor is RGBA
new Color (BitConverter.ToInt32 ([pixel.R, pixel.G, pixel.B, pixel.A]));
// The uint constructor is ARGB
new Color (BitConverter.ToUInt32 ([pixel.A,pixel.R, pixel.G, pixel.B]));

I could add a direct implicit cast if you like, to make life easier while using it. 🤷‍♂️

IIRC, the uint vs int decision was based on the same or very similar design with System.Drawing.Color, for consistency.

dodexahedron avatar Sep 16 '24 05:09 dodexahedron

Color can be constructed as ARGB or RGBA

Yup, wasn't meaning that there was a problem with the constructor param order just that I made mistake right at the start and kept thinking issue was with palette generation.

sixel exposes the raw value as an RGBA or ARGB value

Sixel exposes RGB only (no A) and it is on a scale of 0-100. You can define up to x colors (typically 256) and those can be any RGB values you like.

For example

#0;2;100;0;0

The # indicates that we are declaring a color. The 0 is the index in the palette we are setting (i.e. the first color). The 2 indicates Type (RGB) - its basically always going to be a 2. Then you have RGB as 0-100 scaled.

So the above declares the color red (255,0,0) as palette entry 0.

The above text string is the the pure ASCII that you would output to the console when redering the sixel.

You use the palette colors when you encode the pixel data. Rendering pixels involves selecting a color index (from palette) then filling along the band (6 pixels high) with it. Then either 'rewinding' to start of band and drawing with a different color or moving to next band.

Have you tested out how it behaves if you've altered your terminal color settings/environment variables?

At this stage I am making something that works and then writing tests and documenting. I have tested in Windows Terminal Preview (the one that supports sixel) and ML Term (on linux).

There are some terminals that support limited palette sixel (e.g. 16 colors instead of 256). But I think generally if a terminal supports sixel it probably supports true color too.

Once it is done it can be tested for compatibility under corner cases like color setting changes. I think compatibility will have to be left to the user i.e. user can set a config value to support sixel rather than trying to dynamically detect it based on environment vars etc.

tznind avatar Sep 16 '24 06:09 tznind

Interesting.

As for the color values for conversion purposes, I'd suggest an extension method on Sixel colors in the spirit/convention of the common ToBlahBlah or FromBlahBlah methods color types often have. If there's value to you in doing that, of course.

I wouldn't suggest any modifications to Color that directly depend on any types from Sixel, for sake of separation and not building in too deep a dependency, though. 🤷‍♂️

dodexahedron avatar Sep 16 '24 22:09 dodexahedron

This is starting to come together.

I now have a pretty firm understanding of sixel encoding and can write tests explaining the expected encoded pixel data.

TODO:

  • more tests
  • screen positioning
  • sixel to screen coordinates measuring
  • potentially some resizing logic in UICatalog (I.e. to show how you ensure sixel doesn't spill out of a View)
  • refactor and finalise the 'out of the box' algorithms and move rest to UICatalog

So far I have not really touched the console drivers. Only hooking in via static to NetDriver - where outputting sixel encoded image is 2 lines of code (console move then console write).

Probably need some guidance on how best to implement in drivers once I've done the above

tznind avatar Sep 22 '24 22:09 tznind

Looks like you can detect sixel support by outputting the terminal querying code:

 "\u001B[c";

If the VT220-level reply contains a '4', sixel graphics are supported.

See this article on sixel

This will respond like this:

 [?61;4;6;7;14;21;22;23;24;28;32;42c

Does support sixel (latest Windows Terminal pre release)

For one that does not support sixel you will see:

 [?61;6;7;21;22;23;24;28;32;42c

Does not support sixel (regular windows terminal)

tznind avatar Sep 28 '24 11:09 tznind

@tig / @dodexahedron / @BDisp do we make use of this console querying system? (send escape code, read response). Is there anywhere in the code that does this kind of thing I could tap into?

tznind avatar Sep 28 '24 11:09 tznind

In the NetDriver I used in the Init method a line to read the terminal type. Actually I don't know at the moment.

BDisp avatar Sep 28 '24 12:09 BDisp

Ok nice looks like NetDriver is already all set up for sending and getting responses.

There is a Queue and then in the handler code it looks for the terminator to match up with outstanding requests.

Only odd thing is that it seems tied to mouse handling. I guess that was the first use case.

What I have implemented works (see image) but I think I should test on other consoles plus also we could look at how best to design this.

Basically you send <esc>[c

See Reports - Device Attributes (DA)

And then you get the response elements and if you see a "4" anywhere in that array then it means sixel support.

<esc>[?61;4;6;7;14;21;22;23;24;28;32;42c example response

@tig what is the future goal of NetDriver vs WindowsDriver? Are we planning on ditching ncurses driver? I'd rather avoid implementing too much into a driver that is not getting worked on and/or implementing the same thing twice/three times i.e. in each driver.

Changes are in db0fc41

image

tznind avatar Sep 29 '24 09:09 tznind

Requested ANSI escape sequences shouldn't be enqueued because we need his response immediately before return from the method that did the request. So, it must be rewrite to provide a response like this var response = ExecuteAnsiRequest ("\u001B[c"); which will return ("\u001b[?61;6;7;14;21;22;23;24;28;32;42c", ""), which is a tuple that return the output/error. If error isn't empty then his information will be provided.

BDisp avatar Sep 29 '24 12:09 BDisp

escape sequences shouldn't be enqueued because we need his response immediately [...] must be rewrite to provide a response like this [blocking function]

These request/response calls seem to be inherently async. The only way to block waiting the response would be to Thread Sleep or something which seems hacky.

That said we could do something like

/// <summary>
/// Describes an ongoing ANSI request sent to the console.
/// Use <see cref="ResponseRecieved"/> to handle the response
/// when console answers the request.
/// </summary>
public class AnsiRequest
{
    /// <summary>
    /// Code to send e.g. see <see cref="EscSeqUtils.CSI_Device_Attributes_Request"/>
    /// </summary>
    public string Code { get; init; }

    /// <summary>
    /// <para>
    /// The terminator that uniquely identifies the type of response as responded
    /// by the console. e.g. for <see cref="EscSeqUtils.CSI_Device_Attributes_Request"/>
    /// the terminator is <see cref="EscSeqUtils.CSI_Device_Attributes_Request_Terminator"/>.
    /// </para>
    /// <para>
    /// After sending a request, the first response with matching terminator will be matched
    /// to the oldest outstanding request.
    /// </para>
    /// </summary>
    public string Terminator { get; init; }

    /// <summary>
    /// Invoked when the console responds with an ANSII response code that mathces the
    /// <see cref="Terminator"/>
    /// </summary>
    public event EventHandler<string []> ResponseRecieved;
}

tznind avatar Sep 29 '24 13:09 tznind

I wouldn't do it async because it you request the cursor position for e.g. you may want the response right away to take some action while you are in the UI thread.

BDisp avatar Sep 29 '24 14:09 BDisp

But what if the console doesn't respond for some reason or doesn't support the command and ignores it? It would mean timeouts or other stuff and get complicated.

tznind avatar Sep 29 '24 14:09 tznind

Not exactly if we handle the ReadKey in the request method and ensuring returning the response or error in it.

BDisp avatar Sep 29 '24 15:09 BDisp

There is a delay between sending the code and the terminal responding.

Furthermore rhere is only 1 console input stream and that is already being handled in ProcessInputQueue (in NetDriver) - Specifically in this case in HandleRequestResponseEvent .

We could easily get events between us sending the ANSI code and us reading the response (e.g. mouse events, user keypresses etc).

The current code is already essentially async - the write stream code is decoupled from the read stream code. It loops the input and handles accordingly (mouse etc). Then when it sees an ANSI response it looks to see if there is an outstanding request (with same terminator) and 'dequeues' it.

I think this current code implementation is correct. I was able to add my new ANSI request with very few lines of code.

The solution for API user friendliness is either to try and force the process to be blocking as you say or embrace the current system and use a callback for when answer is seen. Blocking is problematic because of:

  • What if the console does not respond for some reason?
  • We have to buffer all intermediate chars that are not what we are waiting for
  • We have to write a second parallel input processing function when there is already a perfectly good one in place

tznind avatar Sep 29 '24 15:09 tznind

I don't have a ton of time RN to comment but...

See:

  • #2803

At the core of EscSeqUtil is code that tries to provide, effectively, a message-based, protocol handler for ANSI ESC sequences.

The current code does it in a brute force way. It works, but barely. NetDriver uses it for querying the terminal, and we will need to use it (or a better replacement) to both make Windows Driver use ANSI sequnces direclty for everything and to replace CursesDriver.

In a defunct PR (#2940), I started down this path but abandoned it because of other priorities.

See this issue for the current master issue for all this stuff:

  • #2610

tig avatar Sep 29 '24 15:09 tig

@tig what is the future goal of NetDriver vs WindowsDriver? Are we planning on ditching ncurses driver? I'd rather avoid implementing too much into a driver that is not getting worked on and/or implementing the same thing twice/three times i.e. in each driver.

Yes, my vision is that curses driver goes away. We have one driver based on the ANSI ESC Sequence "protocol" that replaces both WindowsDriver and CursesDriver. NetDriver lives on as a lowest-common-denominator. Note, the WT team refers to this protocol as "Virtual Terminal Sequences" which I find confusing.

As noted above, a key piece of work required is a robust engine for handling the ANSI esc sequence protocol, which is a bi-directional protocol, in a way that enables BOTH synchronous and async use-cases.

You'd think such a thing already existed, but I've not found a good .NET library that does it.

What we should NOT do is try to build it from the ground up. The patterns are well-understood, standard message-based protocol patterns. We should choose an existing MIT license protocol library that we build on. The ANSI spec (and implementations) are not pure and have weird edge-cases given the long history, so we'll have a layer that will look a bit ugly.

@dodexahedron has recently done work on the drivers for #3692 which I have not looked closely at, but am assuming paves the way for all this in a good way.

tig avatar Sep 29 '24 16:09 tig

As noted above, a key piece of work required is a robust engine for handling the ANSI esc sequence protocol

This looks pretty good: https://github.com/darrenstarr/VtNetCore

Seems to handle escape code processing, key mapping etc. Have to find some time to clone it and give it a spin.

tznind avatar Sep 29 '24 21:09 tznind

@tznind please try with the PR #3768 and let me know what do you think. You can use it on all drivers to check if sixel is supported at least. I know you don't need immediate response but my solution can be used in all drivers and so, at request level, can be used. Thanks.

BDisp avatar Sep 30 '24 00:09 BDisp

Please @tznind with tis PR #3770 can you test Sixel in the CursesDriver. Thanks.

BDisp avatar Sep 30 '24 23:09 BDisp

Ok heres the stress test for this feature! I think this is a nice use case to try and optimise for.

Its the doom fire algorithm.

Goal

Heres what it should look like (dropping all the flickering frames): goal

Reality

Heres what it looks like at the moment in NetDriver (also it maxes out a CPU core):

Issue is compounded by the NetDriver redraw bug, see #3761

today

tznind avatar Oct 01 '24 19:10 tznind

That's great. Is the first video using WindowsDriver?

BDisp avatar Oct 01 '24 19:10 BDisp

That's great. Is the first video using WindowsDriver?

I wish!

Sadly not though. I just used drop frames on the recorded gif to speed it up and loose the flicker.

The top video is where I'd like to see the performance at if we can.

At the moment only NetDriver is working for sixel but I'm working on it.

Windows driver may turn out to be faster. There's potentially also scope for multi threading in the sixel encoder. But first I need to understand where bottlenecks are in general - hence this test.

tznind avatar Oct 01 '24 20:10 tznind

Regarding #3761 ,

Both WindowsDriver and NetDriver are both currently frantically repaint constantly - even when there is no change to what they are printing. This results in flickering if you add the sixel data in as it fights with the rune level painting:

windows-driver-frantic-repaint WindowsDriver constantly repaints itself for no reason, fighting with the sixel drawing and causing flicker

I investigated what would happen if I added in a simple performance algorithm for the WriteToConsole method:

// TODO: presume something like this used to exist but broke?
// If console output has changed
if (s != _lastWrite)
{
    // supply console with the new content
    result = WriteConsole (_screenBuffer, s, (uint)s.Length, out uint _, nint.Zero);
}

_lastWrite = s;

In this case the screen no longer repaints itself constantly, only when stuff has actually changed.

However the sixel layer does still repaint, but it does so 'additively' with regards to transparency.

So sixels can be drawn but do not disapear until the console refreshes that area - a good thing. It means we can't do animation + transparent pixels or animations that move across screen. But we can do static animations or moving animations that flicker horribly.

windows-driver-overwrite WindowsDriver doomfire with screen update loop programmed to only write to console when ASCII content has changed for the screen

tznind avatar Oct 03 '24 18:10 tznind

Can you please try in the CursesDriver using my PR #3770? You can the same code as you used for the NetDriver.

BDisp avatar Oct 03 '24 18:10 BDisp

Can you please try in the CursesDriver using my PR #3770? You can the same code as you used for the NetDriver.

Sure, I think I should create a seperate branch if I'm merging the curses driver and the standalone escape sequences PRs into the sixel work.

I don't want a monster PR especially incase those other 2 get further updates.

tznind avatar Oct 03 '24 19:10 tznind