bat icon indicating copy to clipboard operation
bat copied to clipboard

so slow at alpine (linux/386)

Open X7md opened this issue 1 year ago • 10 comments

What steps will reproduce the bug?

➜  ~ time bat helloWorld.c
───────┬───────────────────────────────────
       │ File: helloWorld.c
───────┼───────────────────────────────────
   1   │ #include<stdio.h>
   2   │ 
   3   │ int main(){
   4   │   printf("Hello World!\n");
   5   │   return 0;
   6   │ }
───────┴───────────────────────────────────
bat helloWorld.c  5.26s user 0.07s system 99% cpu 5.383 total
➜  ~ time cat helloWorld.c
#include<stdio.h>

int main(){
  printf("Hello World!\n");
  return 0;
}
cat helloWorld.c  0.01s user 0.00s system 82% cpu 0.012 total

What happens? bat take so much time

What did you expect to happen instead? Be fast then 5s....

How did you install bat? Alpine:

echo "@testing http://dl-cdn.alpinelinux.org/alpine/v3.16/community" >> /etc/apk/repositories

apk add bat@testing

bat version and environment

➜ ~ bat --diagnostic

Software version

bat 0.18.1 (v20210212-7674-g0e2220d5bb)

Operating system

Operating system type

Command-line

bat --diagnostic 

Environment variables

SHELL=/bin/zsh
PAGER=less
LESS=-R
BAT_PAGER=<not set>
BAT_CACHE_PATH=<not set>
BAT_CONFIG_PATH=<not set>
BAT_OPTS=<not set>
BAT_STYLE=<not set>
BAT_TABS=<not set>
BAT_THEME=<not set>
XDG_CONFIG_HOME=<not set>
XDG_CACHE_HOME=<not set>
COLORTERM=<not set>
NO_COLOR=<not set>
MANPAGER=<not set>

Config file

Could not read contents of '/root/.config/bat/config': No such file or directory (os error 2).

Compile time information

  • Profile: release
  • Target triple: i586-alpine-linux-musl
  • Family: unix
  • OS: linux
  • Architecture: x86
  • Pointer width: 32
  • Endian: little
  • CPU features: fxsr,sse,sse2
  • Host: i586-alpine-linux-musl

Less version

> less --version 
less 581 (POSIX regular expressions)
Copyright (C) 1984-2021  Mark Nudelman

less comes with NO WARRANTY, to the extent permitted by law.
For information about the terms of redistribution,
see the file named README in the less distribution.
Home page: https://greenwoodsoftware.com/less

-->

X7md avatar May 26 '23 01:05 X7md

That is slow indeed :-)

Could you please run a few tests? Is it also slow if you run with bat --pager=never?

If so, can you please run strace -f -c bat --pager=never helloWorld.c to see if there are any syscalls that take a long time?

If possible, could you run perf record --call-graph=dwarf bat --pager=never helloWorld.c and provide us the generated perf.data file?

sharkdp avatar May 26 '23 07:05 sharkdp

Unfortunately bat --pager=never still slow, also both strace and pref show me sys call error.

➜  ~ strace -f -c bat --pager=never helloWorld.c
strace: test_ptrace_get_syscall_info: PTRACE_SETOPTIONS: Invalid argument
➜  ~ perf record --call-graph=dwarf bat --pager=never helloWorld.c
[1]    108 invalid system call  perf record --call-graph=dwarf bat --pager=never helloWorld.c

X7md avatar May 26 '23 14:05 X7md

Can you also try with the latest version of bat please?

Enselic avatar May 26 '23 16:05 Enselic

Can you also try with the latest version of bat please?

I try build latest version, works but:

image image

I build it like this: https://hub.docker.com/r/messense/rust-musl-cross/tags

x7mdnet@cloudshell:~/$  docker pull messense/rust-musl-cross:i586-musl
x7mdnet@cloudshell:~/$  alias rust-musl-builder='docker run --rm -it -v "$(pwd)":/home/rust/src messense/rust-musl-cross:i586-musl'
x7mdnet@cloudshell:~/$  git clone --recursive https://github.com/sharkdp/bat

x7mdnet@cloudshell:~/$  cd bat

x7mdnet@cloudshell:~/bat/$  rust-musl-builder cargo build --release  --bins
x7mdnet@cloudshell:~/bat/$ file target/i586-unknown-linux-musl/release/bat
target/i586-unknown-linux-musl/release/bat: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped

X7md avatar May 27 '23 00:05 X7md

Can you also try with the latest version of bat please?

I try build latest version, works but:

image image I build it like this: https://hub.docker.com/r/messense/rust-musl-cross/tags

x7mdnet@cloudshell:~/$  docker pull messense/rust-musl-cross:i586-musl
x7mdnet@cloudshell:~/$  alias rust-musl-builder='docker run --rm -it -v "$(pwd)":/home/rust/src messense/rust-musl-cross:i586-musl'
x7mdnet@cloudshell:~/$  git clone --recursive https://github.com/sharkdp/bat

x7mdnet@cloudshell:~/$  cd bat

x7mdnet@cloudshell:~/bat/$  rust-musl-builder cargo build --release  --bins
x7mdnet@cloudshell:~/bat/$ file target/i586-unknown-linux-musl/release/bat
target/i586-unknown-linux-musl/release/bat: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped

It's faster btw xD... but It's missing up the terminal without color.

./bat helloWorld.c  0.25s user 0.01s system 87% cpu 0.302 total

X7md avatar May 27 '23 01:05 X7md

Fork it and build it using GitHub Action CI/CD

https://github.com/X7md/bat/actions/runs/5096205185?notification_referrer_id=NT_kwDOAzsSmbM2NTgzNzY3NDI5OjU0MjAzMDMz

still the same:

Screenshot 2023-05-27 at 4 31 29 AM

X7md avatar May 27 '23 01:05 X7md

which terminal emulator are you using?

sharkdp avatar Jun 14 '23 18:06 sharkdp

which terminal emulator are you using?

https://ish.app, but I don't think it's because of it

I tested in (TinyEMU alpine x86 a WebBase Linux Emulator) same results...

X7md avatar Jun 15 '23 01:06 X7md

which terminal emulator are you using?

https://ish.app, but I don't think it's because of it

I tested in (TinyEMU alpine x86 a WebBase Linux Emulator) same results...

It's works and look pretty perfect (no weird output) through SSH btw in my pi zero aarch64... so don't think it's from the terminal emulator

IMG_9418

X7md avatar Jun 15 '23 01:06 X7md

Seeing this issue, I wanted to see how musl (Alpine's libc implementation) stacked up against glibc.

Testing Methodolgy

Unfortunately, I don't have a non-64-bit x86 machine laying around. I used Docker's --platform=linux/i386 to pull in 32-bit libc versions, though.

  1. On a single machine, run two Docker instances.
    • Instance A is using alpine:latest.
    • Instance B is using debian:latest.
  2. Install the latest version of hyperfine (1.18.0) using packages provided under the GitHub releases.
    • Instance A used the tgz-packaged musl version.
    • Instance B used the tgz-packaged gnu version.
  3. Install the latest version of bat from the GitHub releases.
    • Instance A used the tgz-packaged musl version.
    • Instance B used the tgz-packaged gnu version.
  4. Use cat to create your helloWorld.c file.
  5. Run ./hyperfine -N -w 100 -r 500 "./bat helloWorld.c --paging=never --terminal-width=80 --force-colorization" on both.
    • 100 warm-up runs.
    • 500 measurement runs.
    • Does not create a new shell instance to run bat.
    • Colorization is forced to prevent bat's simple printer from being used.
    • The pager is disabled.
    • The terminal width is assumed to be 80 characters.

Results

Alpine:

  Time (mean ± σ):      15.7 ms ±   1.7 ms    [User: 13.3 ms, System: 2.3 ms]
  Range (min … max):    11.9 ms …  21.2 ms    500 runs

Debian:

  Time (mean ± σ):      26.9 ms ±   2.6 ms    [User: 16.6 ms, System: 10.1 ms]
  Range (min … max):    20.6 ms …  39.1 ms    500 runs

The distro using glibc is faster than musl for small files, but it's nothing on the order of seconds different. I know I don't have the most ideal hardware to test this (since the kernel is still amd64), but I feel like we can at least rule out the possibility of it being related to which libc implementation is being used.


Speculation

It's possible that building for i586 is disabling optimizations that use various SIMD extensions, including SSE (which was added with the i686/P6 architecture according to Wikipedia). SSE added 8 new registers and a bunch of instructions, and a quick and dirty dissasembly of the i686 release in this tarball shows that those features are used about 32000 times:

objdump --disassembler-color=extended -d bat-v0.24.0-i686-unknown-linux-gnu/bat | grep -E 'xmm|movss|movaps|movups|movlps|movhps|movlhps|movmskps|addss|subss|mulss|divss|rcpss|sqrtss|maxss|minss|rsqrtss|addps|subps|mulps|divps|sqrtps|maxps|minps|rsqrtps|cmpss|comiss|ucomiss|cmpps|shufps|unpckhps|unpcklps|cvtsi2ss|cvtss2si|cvttss2si|cvtpi2ps|cvtps2pi|cvttps2pi|andps|orps|xorps|andnps|pmulhuw|psadbw|pavgb|pavgw|pmaxub|pminub|pmaxsw|pminsw|pextrw|pinsrw|pmovmskb|pshufw|mxcsr|ldmxcsr|stmxcsr|movntq|movntps|maskmovq|prefetch0|prefetch1|prefetch2|prefetchnta|sfence' | wc -l

That's only about 3.5% of the total instruction count of that executable, though. I'd love to know if it makes that significant of a difference, but I don't have a toolchain to build for i586 set up at the moment.

eth-p avatar Feb 08 '24 07:02 eth-p