coreutils
coreutils copied to clipboard
Oversized executables
The uutils executables are a bit larger than their native counterparts. These are the stats on OS X with O3, LTO, and alloc_system:
| Name | Native | uutils |
|---|---|---|
| base64 | 8.0K | 200K |
| basename | 8.0K | 160K |
| cat | 8.0K | 204K |
| chmod | 12K | 524K |
| chroot | 8.0K | 220K |
| cksum | 8.0K | 180K |
| comm | 8.0K | 168K |
| cp | 12K | 204K |
| cut | 8.0K | 296K |
| dirname | 8.0K | 152K |
| du | 12K | 328K |
| echo | 8.0K | 132K |
| env | 8.0K | 148K |
| expand | 8.0K | 208K |
| expr | 8.0K | 168K |
| factor | - | 208K |
| false | 4.0K | 80K |
| fold | 8.0K | 196K |
| groups | 8.0K | 168K |
| hashsum | - | 596K |
| head | 8.0K | 196K |
| hostid | - | 148K |
| hostname | 8.0K | 192K |
| id | 8.0K | 192K |
| kill | 8.0K | 180K |
| link | 8.0K | 156K |
| ln | 8.0K | 208K |
| logname | 8.0K | 156K |
| mkdir | 8.0K | 192K |
| mkfifo | 8.0K | 164K |
| mv | 8.0K | 220K |
| nice | 8.0K | 176K |
| nl | 8.0K | 512K |
| nohup | 8.0K | 184K |
| nproc | - | 160K |
| od | 16K | 140K |
| paste | 8.0K | 200K |
| printenv | 8.0K | 160K |
| ptx | - | 668K |
| pwd | 8.0K | 164K |
| readlink | 12K | 192K |
| realpath | - | 192K |
| relpath | - | 200K |
| rm | 8.0K | 208K |
| rmdir | 8.0K | 176K |
| seq | 8.0K | 228K |
| shuf | - | 224K |
| sleep | 8.0K | 200K |
| sort | 28K | 220K |
| split | 8.0K | 228K |
| stdbuf | - | 244K |
| sum | 8.0K | 180K |
| sync | 8.0K | 148K |
| tac | - | 192K |
| tail | 12K | 200K |
| tee | 8.0K | 204K |
| test | 8.0K | 112K |
| timeout | - | 264K |
| touch | 8.0K | 200K |
| tr | 12K | 196K |
| true | 4.0K | 80K |
| truncate | - | 196K |
| tsort | 8.0K | 216K |
| tty | 8.0K | 160K |
| uname | 8.0K | 164K |
| unexpand | 8.0K | 212K |
| uniq | 8.0K | 220K |
| unlink | 8.0K | 164K |
| uptime | 12K | 204K |
| users | 8.0K | 160K |
| wc | 8.0K | 192K |
| whoami | 8.0K | 160K |
| yes | 4.0K | 160K |
I think the funniest one is nl, which is 6300% larger than the native nl. jemalloc would've added another 230K to each of these.
I realize some of this is Rust's fault: when an optimized, LTO'd, alloc_system'd fn main(){println!("Hi!\n");} is still 84K, there's not much room. For example from the object dump/disassembly, about 9% of that dead weight was panicking code & string literals for the standard library :\ If we're really condemned to that, and to an 80K hello world, with all the implied overhead (and it's clearly to scale, as seen above), then this raises serious doubts about Rust as a system language.
But surely we can shed some of the remaining 196K/216K/etc off of tr/tsort/friends? The median size of the native executables is 8.0K.
That's one of the reasons the multicall binary exists. As for individual binaries, I'm not sure what we can do except trying to reduce the number of dependencies.
The project could look into https://github.com/lrs-lang/lib, but it looks like a pretty big change and it may remove large amounts of cross-platform support.
@hexel I did :) Unfortunately it's linux only.
The elephants in the room are static linking, ABI compatibility, and as you mentioned libstd.
Some demonstrations. all "results" blocks are generated using the following command:
strip main && stat --printf="%s bytes\n" main && ldd main | cut -d= -f1
All of the results are going to be pretty specific to x86_64. I'm running debian stable.
All of the results below link against libc, the C standard library. You discuss suitability as a systems language. C is a system language, but most of its userland applications, like rust, use a standard library which takes up more than hundreds of kilobytes.
The difference as is demonstrated below, is that cargo by default does not use dynamic linking, because it's expected that most users will not (at this point in time) have an ABI-compatible rust standard library installed.
approaches
Rust, using libstd, static linking to libstd
command
echo 'fn main(){println!("Hello!\n");}' > main.rs
rustc -C opt-level=3 -C lto main.rs
results
290864 bytes
linux-vdso.so.1 (0x00007fff6fd88000)
libpthread.so.0
libgcc_s.so.1
libc.so.6
/lib64/ld-linux-x86-64.so.2 (0x00007f278dfd8000)
Rust, using libstd, dynamic linking to libstd
libstd is about 4.4mb and has to live on the operating system.
command
echo 'fn main(){println!("Hello!\n");}' > main.rs
rustc -C opt-level=3 -C prefer-dynamic main.rs
results
5528 bytes
linux-vdso.so.1 (0x00007ffc9c70b000)
libstd-17a8ccbd.so
libc.so.6
libdl.so.2
libpthread.so.0
libgcc_s.so.1
/lib64/ld-linux-x86-64.so.2 (0x00007f7e2c10c000)
libm.so.6
librt.so.1
Rust, no libstd framework
commands
wget https://raw.githubusercontent.com/rust-lang/rust/master/src/test/run-pass/smallest-hello-world.rs
rustc -C opt-level=3 -C lto -o main smallest-hello-world.rs
results
4992 bytes
linux-vdso.so.1 (0x00007ffeae1bf000)
libc.so.6
/lib64/ld-linux-x86-64.so.2 (0x00007f63f0832000)
C++, dynamic linking to libstdc++
Provided on my system by libgcc, which links directly against libm, so together they take up about 1.1mb total.
commands
echo -e '#include <iostream>\nint main() { std::cout << "Hello!\\n"; return 0; }' > main.cpp
clang++ -Oz main.cpp -o main
results
5400 bytes
linux-vdso.so.1 (0x00007ffd4d58e000)
libstdc++.so.6
libm.so.6
libgcc_s.so.1
libc.so.6
/lib64/ld-linux-x86-64.so.2 (0x00007f943e40b000)
C
commands
echo -e '#include <stdio.h>\nint main() { printf("Hello!\n"); }' > main.c
clang -Oz main.c -o main
results
4616 bytes
linux-vdso.so.1 (0x00007fff4363f000)
libc.so.6
/lib64/ld-linux-x86-64.so.2 (0x00007fc53febd000)
Can we just do everything dynamically
Imaginably if a distribution like debian or fedora has a libstdc++ that c++ programs can be compiled against, why not do this for rust?
Rust currently lacks ABI stability.
Well, it's the same and it isn't. C++ these uses a (relatively much more) stable ABI - it usually only changes with a major standards change. This means that when libstdc++ is compiled with a slightly newer version of clang you can install that on your system without also upgrading binaries that use libstdc++ and were compiled with an older version of clang.
When it has ABI stability, we'll be able to compete.
Rust doesn't have that yet, it's in the works as the Rust developers know it's needed to be able to ship binaries that are not so tightly coupled. But in the meantime, if you have a library that was complied with rustc v1.2 and you upgrade it and this new version is compiled with rustc v1.5, all binaries and libraries you were using that were linking against that library also now need to be replaced with versions of themselves compiled with rustc v1.5.
At some point in the future, there will be a stable ABI, and some systems will begin installing libstd as a dependency for some other tool. And for systems which have libstd, not only will the footprint of uutils coreutils per se is going to be a few hundred kb less, but we'll be able to painlessly split it into one binary per tool.
In the meantime
In the meantime, the best option is just as we're going right now - statically include parts of libstd that we need, then use a "multicall install" which imitates a binary for each tool we're providing via symbolic links.
In terms of speed, dynamic or static linking really has a negligible difference.
creating linkback for #140
@nathanross Dynamic linking is not the answer, nor is multicall.
Rust programs already dynamically link to libSystem on OS X, which provides the entire C standard library plus a multitude of other features. The solution is not to dynamically link libstd.
The "features" libstd provides on top of libSystem are minimal—primarily structural—and in trivial programs ought to be removable. And indeed they can be, as LRS-lang demonstrates, but this requires undoing design flaws from Rust.
And multicall works rather poorly on windows. The solution instead is to judiciously code the binaries as close to the 80K minimum as possible.
@alexchandel: how would a C library help with providing e.g. Rust-style string formatting?
A viable solution on Windows might be multicall built as a dylib, plus a small stub binary that just calls the main entry point in that library. The latter could use #![no_std] to ensure the smallest possible size.
@vadimcn It doesn't need to, because string-formatting makes up relatively little of these binaries, and is inlined little with relative ease (once you stop panicking).
If you actually read the disassembly for PROFILE=release cp, you'll find that the largest symbols by far (at 34% of the text section) are __ZN4copy20h24e32c79ba610ccdJmaE and __ZN6uumain20ha15e8d6f5b9ddecfceaE. And you'll notice that there are calls to a huge number of symbols that panic in unoptimizable ways. Many of these are from show_error, which makes two unoptimizable writes that inexplicably panic instead of silently failing or aborting, but there are just as many explicit panics.
For comparison, I'm working on an ls for coreutils that never panics and obviously uses its own print/show_error macros, and it's barely 130K yet does far more than cp. I haven't even gotten around to optimizing it yet.
fascinating @alexchandel your continued investigation into, and passion about, this topic is greatly appreciated.
@alexchandel Have you done any additional test since this issue was last discussed? It has been more than 2 years now and rust has changed quite a lot. Would be interesting to see how the binary size was influenced by this.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Some notes from playing with the size:
I added:
cargo-features = ["strip"]
# ...
[profile.release]
strip = "symbols"
opt-level = 'z'
lto = true
codegen-units = 1
panic = 'abort'
# commands I ended up using
# make build-coreutils build-pkgs PROFILE=release
# fd . --type x --max-depth 1 ./target/release/ -x du {} | sort -k 2,2
# did some manual formatting to help with readability
3844 arch -> 496 arch
4020 base32 -> 580 base32
4020 base64 -> 580 base64
3888 basename -> 512 basename
3924 cat -> 532 cat
3376 chgrp -> 328 chgrp
3964 chmod -> 544 chmod
3980 chown -> 548 chown
3948 chroot -> 548 chroot
3900 cksum -> 516 cksum
3892 comm -> 512 comm
14684 coreutils -> 4428 coreutils
4080 cp -> 620 cp
5572 csplit -> 1392 csplit
3964 cut -> 552 cut
3976 date -> 568 date
3972 df -> 560 df
3388 dircolors -> 340 dircolors
3876 dirname -> 508 dirname
3428 du -> 376 du
3880 echo -> 500 echo
3996 env -> 556 env
3912 expand -> 520 expand
3768 expr -> 684 expr
3372 factor -> 332 factor
3176 false -> 232 false
3944 fmt -> 540 fmt
3908 fold -> 520 fold
3880 groups -> 504 groups
5580 hashsum -> 1432 hashsum
3980 head -> 560 head
3272 hostid -> 280 hostid
3916 hostname -> 528 hostname
3916 id -> 520 id
4028 install -> 580 install
3944 join -> 556 join
3892 kill -> 516 kill
3864 link -> 500 link
3916 ln -> 532 ln
3860 logname -> 504 logname
5880 ls -> 1528 ls
3892 mkdir -> 516 mkdir
3876 mkfifo -> 504 mkfifo
3892 mknod -> 512 mknod
4000 mktemp -> 568 mktemp
3912 more -> 528 more
3960 mv -> 548 mv
3880 nice -> 512 nice
5424 nl -> 1316 nl
3888 nohup -> 512 nohup
3884 nproc -> 516 nproc
3936 numfmt -> 548 numfmt
4048 od -> 596 od
3892 paste -> 516 paste
3884 pathchk -> 512 pathchk
3948 pinky -> 540 pinky
3872 printenv -> 504 printenv
3320 printf -> 312 printf
5560 ptx -> 1412 ptx
3864 pwd -> 504 pwd
3884 readlink -> 516 readlink
3888 realpath -> 516 realpath
3892 relpath -> 516 relpath
3960 rm -> 544 rm
3872 rmdir -> 512 rmdir
3916 seq -> 536 seq
3972 shred -> 560 shred
3984 shuf -> 560 shuf
3872 sleep -> 512 sleep
4464 sort -> 740 sort
3988 split -> 568 split
3988 stat -> 564 stat
7212 stdbuf -> 868 stdbuf
3896 sum -> 516 sum
3868 sync -> 500 sync
3904 tac -> 520 tac
3928 tail -> 532 tail
3928 tee -> 520 tee
3228 test -> 260 test
3956 timeout -> 548 timeout
3920 touch -> 528 touch
3908 tr -> 516 tr
3176 true -> 232 true
3888 truncate -> 516 truncate
3924 tsort -> 524 tsort
3868 tty -> 500 tty
3872 uname -> 500 uname
3920 unexpand -> 520 unexpand
3972 uniq -> 560 uniq
3876 unlink -> 504 unlink
3960 uptime -> 564 uptime
3872 users -> 504 users
3908 wc -> 520 wc
3952 who -> 540 who
3840 whoami -> 496 whoami
3864 yes -> 500 yes
Finished release [optimized] target(s) in 2m 04s -> Finished release [optimized] target(s) in 3m 41s
For multi call I tried out the performance of the size optimized version and it wasn't too bad but there's probably some tweaking to find a balance of size and performance
# make build-coreutils MULTICALL=y PROFILE=release
$ hyperfine --runs 8 --warmup 2 "/coreutils-8.32/bin/ls -al -R ./linux > /dev/null" "./coreutils_0 ls -al -R ./linux > /dev/null" "./coreutils_1 ls -al -R ./linux > /dev/null"
Benchmark #1: /coreutils-8.32/bin/ls -al -R ./linux > /dev/null
Time (mean ± σ): 393.4 ms ± 15.9 ms [User: 173.5 ms, System: 219.5 ms]
Range (min … max): 377.5 ms … 421.1 ms 8 runs
Benchmark #2: ./coreutils_0 ls -al -R ./linux > /dev/null
Time (mean ± σ): 410.2 ms ± 29.0 ms [User: 265.8 ms, System: 144.1 ms]
Range (min … max): 377.2 ms … 464.8 ms 8 runs
Benchmark #3: ./coreutils_1 ls -al -R ./linux > /dev/null
Time (mean ± σ): 504.5 ms ± 39.4 ms [User: 358.7 ms, System: 145.5 ms]
Range (min … max): 465.0 ms … 586.2 ms 8 runs
Summary
'/coreutils-8.32/bin/ls -al -R ./linux > /dev/null' ran
1.04 ± 0.08 times faster than './coreutils_0 ls -al -R ./linux > /dev/null'
1.28 ± 0.11 times faster than './coreutils_1 ls -al -R ./linux > /dev/null'
$ du coreutils_*
14684 coreutils_0
4428 coreutils_1
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Still important
yeah, this is why I removed the wontfix ;)
Ah sorry, I didnt see that update
no worries
So, i have been thinking about that. if you care about size, the symlink way is the right way.
For example ln -s /usr/bin/coreutils /usr/bin/ls will work
Example in Debian: https://salsa.debian.org/rust-team/debcargo-conf/-/blob/master/src/coreutils/debian/rust-coreutils.links
AFAIK, it works without issue.
I don't think there is a significantly better way to improve this.
Agreed multicall helps a lot with the size but there are also improvements with those cargo settings I listed above
$ du coreutils_*
14684 coreutils_0
4428 coreutils_1
Should any of the settings be adopted?
Not yet Last time I checked, our version of rust/cargo were too old.
Would you like to try to submit a PR?
I got to this place, because I wanted add date into Alpine container.. so I recalled there is great uutils and I could install them instead of regular coreutils, but what hit me is:
4.67 MB of uutils vs 1.02 MB of coreutils in Alpine.
Is this something which should be addressed on distribution level or here?
@okias that depends on what Alpine is already doing. Are they using all the settings for --profile=release-small already? The rest would need to be solved here. Note however that probably there's a limit to what we can do.
@okias we didn't. With --profile=release-small we got (for x86_64)
>>> Size difference for uutils-coreutils: 4784 KiB -> 4192 KiB
but researching this I discovered --features=feat_os_unix_musl and together
>>> Size difference for uutils-coreutils: 4784 KiB -> 4428 KiB
it's still a bit smaller but perhaps not enough for the needs of @tertsdiepraam .
https://git.alpinelinux.org/aports/tree/testing/uutils-coreutils/APKBUILD
Interesting, thanks! That's not quite enough indeed. This deserves some more investigation. However, I do want to set expectations, there's probably nothing we can do that will immediately cut the size to 1/4 of the current size.
Alright, so some questions first (these are both questions you might be able to answer and just open questions I want to investigate):
- Does the 4MB figure include the dependencies (such as onigurama which is another 500KB)
- Can we also make a comparison without additional utilities (such as
moreandb3sum)? - Are there other stripping strategies that we could use, beyond what Rust provides by default?
- I do want to ensure that we are comparing apples to apples here: how does static vs dynamic linking influence these figures? Is uutils just bigger because Rust prefers static linking or is there an actual difference in size?
As a first data point, here's some output of cargo bloat of big crates:
File .text Size Crate
6.3% 16.6% 1.1MiB std
2.2% 5.8% 387.4KiB regex_automata
2.0% 5.2% 346.4KiB clap_builder
1.7% 4.5% 304.2KiB uu_sort
1.0% 2.7% 181.4KiB uucore
1.0% 2.7% 179.3KiB uu_ls
0.9% 2.4% 162.9KiB regex_syntax
0.8% 2.1% 141.8KiB [Unknown]
0.7% 1.9% 129.6KiB aho_corasick
0.7% 1.8% 119.4KiB uu_tail
0.6% 1.5% 100.6KiB uu_cp
0.5% 1.4% 95.5KiB notify
0.5% 1.3% 85.1KiB uu_pr
0.4% 1.2% 78.6KiB clap_complete
0.4% 1.1% 76.0KiB data_encoding
0.4% 1.1% 75.8KiB uu_split
0.4% 1.1% 72.4KiB hashbrown
0.4% 1.0% 69.1KiB chrono
0.4% 1.0% 68.6KiB uu_dd
0.4% 1.0% 68.3KiB uu_ptx