vim icon indicating copy to clipboard operation
vim copied to clipboard

xxd color output is inefficient

Open easyaspi314 opened this issue 1 year ago • 1 comments

Steps to reproduce

  1. xxd -R never a big file to a terminal
  2. xxd -R always a big file to a terminal

It takes significantly longer to run the colored version.

For example, doing time xxd -R always vim (vim is 3126016 bytes on my system) on Termux 0.118.1 takes 2 minutes, 56 seconds (~18 KB/s, output 82 MB), where time xxd -R never vim takes 24.7 seconds (~126 KB/s, output 13 MB).

The slowdown is not xxd itself, but a bottleneck in terminal I/O. If you print the raw output to a file and cat it, it still takes about 3 minutes.

The issue is that xxd is printing escape sequences for every single byte, even if the last byte was the same color. This drastically inflates the output size, and when combined with how many terminals additionally aren't optimized for escape sequences, xxd slows to a crawl.

For example, 300 bytes are printed for the simple string hello\n, which is far more than 58 for non colored output.

$ echo "hello" | xxd -R always | sed -e $'s/\033/E/g'
00000000: E[1;32m68E[0mE[1;32m65E[0m E[1;32m6cE[0mE[1;32m6cE[0m E[1;32m6fE[0mE[1;33m0aE[0m                E[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0m E[1;32mhE[0mE[1;32meE[0mE[1;32mlE[0mE[1;32mlE[0mE[1;32moE[0mE[1;33m.E[0m

Expected behaviour

xxd should only print escape characters when the color changes. The exact same visual output can be done in much fewer bytes, and it will perform faster especially on less optimized terminals. Additionally, the bold attribute is independent, so it is only necessary to apply that at the start of each line.

00000000: E[1;32m6865 6c6c 6fE[33m0a                           E[32mhelloE[33m.E[0m

With the same file using a simple partial implementation of a context aware coloring below, xxd-ing vim in color only takes 1 minute 3 seconds, printing only 30 MB to the terminal. It isn't impressive but it is a third of the time, and as mentioned before, the bottleneck is the terminal itself.

However, with an ASCII text file the same size consisting of only printables and newlines, it only takes 34 seconds since it isn't changing colors as much. xxd still takes about 3 minutes.

Using printf vs xxd string handling doesn't matter, we are still bottlenecked by the terminal.

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>

#define COLOR_RED '1'
#define COLOR_GREEN '2'
#define COLOR_YELLOW '3'
#define COLOR_BLUE '4'
#define COLOR_WHITE '7'
const char hexxa[] = "0123456789abcdef0123456789ABCDEF", *hexx = hexxa;
 static int
get_char_color(int e)
{
      if (e > 31 && e < 127)
        return COLOR_GREEN;
      else if (e == 9 || e == 10 || e == 13)
        return COLOR_YELLOW;
      else if (e == 0)
        return COLOR_WHITE;
      else if (e == 255)
        return COLOR_BLUE;
      else
        return COLOR_RED;
}

#if 1
#define ESC "\033"
#else
#define ESC "E" // to test raw I/O
#endif
// crude colored xxd clone
// doesn't handle leftover lines, this is just for throughput testing
int main(int argc, char *argv[])
{
    unsigned addr = 0;

    char buffer[16]={0};
    char *p;
    int c = 0;
    int color = -1;
    FILE *in = stdin;
    if (argc == 2) {
        in = fopen(argv[1], "rb");
        if (in == NULL) {
            printf("Error opening %s\n", argv[1]);
            return 1;
        }
    }
    while ((c = fgetc(in)) != EOF) {
        if (addr % 16 == 0) {
            if (addr != 0) {
                putchar(' ');
                for (int i = 0; i < 16; i++) {
                    int new_color = get_char_color(buffer[i]);
                    if (color != new_color) {
                        printf(ESC "[3%cm", new_color);
                        color = new_color;
                    }

                    if (isprint(buffer[i])) putchar(buffer[i]);
                    else putchar('.');
                }
                printf(ESC "[0m\n");
            }
            printf("%08x: " ESC "[1m", addr);
            color = -1;
        }
        buffer[addr % 16] = c;
        int new_color = get_char_color(c);
        if (color != new_color) {
            printf(ESC "[3%cm", new_color);
            color = new_color;
        }
        putchar(hexx[c >> 4]);
        putchar(hexx[c & 0xF]);
        if (addr % 2 == 1) putchar(' ');
        addr++;
    }
}

Version of Vim

xxd 2024-05-10 by Juergen Weigert et al.

Environment

OS: Android 14 Terminal: Termux 0.118.1 $TERM: xterm-256color

Logs and stack traces

No response

easyaspi314 avatar Jun 29 '24 04:06 easyaspi314

Yeah, that is true. Can you suggest a PR?

chrisbra avatar Jul 06 '24 16:07 chrisbra

I think the logic how xxd creates one line in one array simultaneously populating hex-column and human-readable-column is too hard and can't be optimized to merge same color areas.

I started new branch (https://github.com/aapo/vim/tree/internally_use_two_arrays) which uses two arrays (left and right). It doesn't yet do any optimization.

Left column doesn't have linebreak but still needs some whitespace adjusting where to put '\0' (so all lines will align the same way).

It is currently missing support for colored little-endian when there is non-full row:

echo -n hello | ./xxd -c 4 -g 2 -R always -e
00000000: 6568 6c6c  hell
00000004:              6f                             o

aapo avatar Oct 21 '24 17:10 aapo