vim
vim copied to clipboard
xxd color output is inefficient
Steps to reproduce
xxd -R nevera big file to a terminalxxd -R alwaysa big file to a terminal
It takes significantly longer to run the colored version.
For example, doing time xxd -R always vim (vim is 3126016 bytes on my system) on Termux 0.118.1 takes 2 minutes, 56 seconds (~18 KB/s, output 82 MB), where time xxd -R never vim takes 24.7 seconds (~126 KB/s, output 13 MB).
The slowdown is not xxd itself, but a bottleneck in terminal I/O. If you print the raw output to a file and cat it, it still takes about 3 minutes.
The issue is that xxd is printing escape sequences for every single byte, even if the last byte was the same color. This drastically inflates the output size, and when combined with how many terminals additionally aren't optimized for escape sequences, xxd slows to a crawl.
For example, 300 bytes are printed for the simple string hello\n, which is far more than 58 for non colored output.
$ echo "hello" | xxd -R always | sed -e $'s/\033/E/g'
00000000: E[1;32m68E[0mE[1;32m65E[0m E[1;32m6cE[0mE[1;32m6cE[0m E[1;32m6fE[0mE[1;33m0aE[0m E[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0mE[1;31m E[0m E[1;32mhE[0mE[1;32meE[0mE[1;32mlE[0mE[1;32mlE[0mE[1;32moE[0mE[1;33m.E[0m
Expected behaviour
xxd should only print escape characters when the color changes. The exact same visual output can be done in much fewer bytes, and it will perform faster especially on less optimized terminals. Additionally, the bold attribute is independent, so it is only necessary to apply that at the start of each line.
00000000: E[1;32m6865 6c6c 6fE[33m0a E[32mhelloE[33m.E[0m
With the same file using a simple partial implementation of a context aware coloring below, xxd-ing vim in color only takes 1 minute 3 seconds, printing only 30 MB to the terminal. It isn't impressive but it is a third of the time, and as mentioned before, the bottleneck is the terminal itself.
However, with an ASCII text file the same size consisting of only printables and newlines, it only takes 34 seconds since it isn't changing colors as much. xxd still takes about 3 minutes.
Using printf vs xxd string handling doesn't matter, we are still bottlenecked by the terminal.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#define COLOR_RED '1'
#define COLOR_GREEN '2'
#define COLOR_YELLOW '3'
#define COLOR_BLUE '4'
#define COLOR_WHITE '7'
const char hexxa[] = "0123456789abcdef0123456789ABCDEF", *hexx = hexxa;
static int
get_char_color(int e)
{
if (e > 31 && e < 127)
return COLOR_GREEN;
else if (e == 9 || e == 10 || e == 13)
return COLOR_YELLOW;
else if (e == 0)
return COLOR_WHITE;
else if (e == 255)
return COLOR_BLUE;
else
return COLOR_RED;
}
#if 1
#define ESC "\033"
#else
#define ESC "E" // to test raw I/O
#endif
// crude colored xxd clone
// doesn't handle leftover lines, this is just for throughput testing
int main(int argc, char *argv[])
{
unsigned addr = 0;
char buffer[16]={0};
char *p;
int c = 0;
int color = -1;
FILE *in = stdin;
if (argc == 2) {
in = fopen(argv[1], "rb");
if (in == NULL) {
printf("Error opening %s\n", argv[1]);
return 1;
}
}
while ((c = fgetc(in)) != EOF) {
if (addr % 16 == 0) {
if (addr != 0) {
putchar(' ');
for (int i = 0; i < 16; i++) {
int new_color = get_char_color(buffer[i]);
if (color != new_color) {
printf(ESC "[3%cm", new_color);
color = new_color;
}
if (isprint(buffer[i])) putchar(buffer[i]);
else putchar('.');
}
printf(ESC "[0m\n");
}
printf("%08x: " ESC "[1m", addr);
color = -1;
}
buffer[addr % 16] = c;
int new_color = get_char_color(c);
if (color != new_color) {
printf(ESC "[3%cm", new_color);
color = new_color;
}
putchar(hexx[c >> 4]);
putchar(hexx[c & 0xF]);
if (addr % 2 == 1) putchar(' ');
addr++;
}
}
Version of Vim
xxd 2024-05-10 by Juergen Weigert et al.
Environment
OS: Android 14 Terminal: Termux 0.118.1 $TERM: xterm-256color
Logs and stack traces
No response
Yeah, that is true. Can you suggest a PR?
I think the logic how xxd creates one line in one array simultaneously populating hex-column and human-readable-column is too hard and can't be optimized to merge same color areas.
I started new branch (https://github.com/aapo/vim/tree/internally_use_two_arrays) which uses two arrays (left and right). It doesn't yet do any optimization.
Left column doesn't have linebreak but still needs some whitespace adjusting where to put '\0' (so all lines will align the same way).
It is currently missing support for colored little-endian when there is non-full row:
echo -n hello | ./xxd -c 4 -g 2 -R always -e
00000000: 6568 6c6c hell
00000004: 6f o