vector icon indicating copy to clipboard operation
vector copied to clipboard

Reduced size binary

Open fungs opened this issue 1 year ago • 12 comments
trafficstars

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

The first sentence on the Vector website states "A lightweight, ultra-fast tool for building observability pipelines". When I looked at the vector binary in the different Debian packages, it is about 127 MiB, equivalent to a full Linux distribution image. That's not really lightweight for most people (including myself, of course).

The binary size can be an issue in some situations like

  • systems with limited memory
  • systems with limited storage
  • high storage and bandwidth costs for updates etc.

Attempted Solutions

No response

Proposal

I don't understand why the binary is so bloated, but here are some ideas to get it down to a reasonable size, or at least to make it more plausible

  • look for redundancy
  • explain included static binary parts
  • provide separate binaries for different deployment roles

I just feel bad to augment a container image with the vector binary for doing such a simple thing as forwarding metrics and by doing so, doubling its size.

References

No response

Version

vector 0.36.1 (x86_64-unknown-linux-gnu 2857180 2024-03-11 14:32:52.417737479)

fungs avatar Mar 11 '24 23:03 fungs

Thanks for opening this discussion @fungs !

I agree with you that Vector's current binary size is not what I would guess when thinking of a "lightweight" binary; even Vector's first "official" release (v0.10.0) had a binary of 80 MB. I think that statement was likely comparing against Splunk, FluentD, and Logstash, which are quite a bit heavier. FluentBit might be a better comparison though I note that FluentBit's binary is 50 MB so its not really that far off (I was thinking it'd be an order of magnitude). As another datapoint the OpenTelemetry Collector, even without any contrib modules, is 99 MB. All of these are looking at x86_64 builds. All of these certainly seem pretty heavy-weight for a "sidecar" deployment.

I agree with the list you have to investigate, and would add a couple of things like striping the output, but I think the real savings is likely to come from users only compiling in the modules they need (similar to the OpenTelemetry Collector Contrib model) since each category of dependency (AWS, Kafka, etc.) brings in quite a bit extra that will be extraneous if you aren't using those modules. We do enable that via feature flags, but it's not well documented or easy for users to create their own distributions. It seems to me that it'll be difficult to maintain Vector's binary size over time without that as we add more and more integrations.

Another note is that Vector statically compiles most dependencies (librdkafka, libsasl, etc.) which is probably not helping the overall binary size. This is done for portability reasons.

jszwedko avatar Mar 12 '24 00:03 jszwedko

@jszwedko, that's exactly the way I'm looking at it. I was referring to the x86_64 architecture, but I assume that the picture is similar for others. I was also comparing to fluentbit, shipping an all-in-one binary of 50 MiB, which seems to have a similar stack and purpose.

Looking at the static compilation issue: I'm not a Rust developer, so I don't know how feasible a dynamic loading approach would be to those extra modules. The extra modules could still contain the dependencies as static, but would only be loaded by the program when actually required, and most importantly be omitted from the distribution in many cases. This is how traditional programs work on Linux. It would circumvent custom or role-specific builds.

How others do it: For example, VictoriaMetrics ships both, an all-in-one binary and role-specific agents (for distributed deployments). That kind of partitioning is a balanced approach vs. having to compile individually for every use case.

Naively and technically, I'd think that one could probably build a set of binary artifacts and bind them per individual use case, but I'm not into the whole Rust tool chain.

Cheers

fungs avatar Mar 12 '24 10:03 fungs

I posted some suggestions on this at: https://github.com/vectordotdev/vector/pull/17342#issuecomment-1932659066

jpds avatar Mar 12 '24 15:03 jpds

Our internal build with just a few sinks and stuff is about 20 MB with LTO

paolobarbolini avatar Mar 12 '24 18:03 paolobarbolini

Incidentally, the vdev development tool includes a subcommand that runs Vector with only the feature flags required to run a given config file turned on (vdev run <CONFIG>). It would be pretty straightforward to leverage this to produce a stripped-down bespoke vector binary (via an option to vdev build vector) for a particular use case without having to know the feature flags required.

bruceg avatar Mar 12 '24 18:03 bruceg

I don't have much time atm to engage much in this discussion, but this was a concern for me and I spent a fair amount of time looking into building Vector for minimal size.

  • I've had a full build of Vector at around 100MB stripped and 20-25MB UPX compressed (adds about 1s to startup time)
  • Minimal build for what I use at about 26MB stripped and 6.6MB with UPX (around 400ms startup time penalty).
  • Minimal build with nightly -Z build-std didn't make much difference, but lto = "fat" + codegen-units=1 with panic = "abort" (biggest contributor IIRC) brought that down to 16MB, or 4.7MB UPX compressed (LZMA).

I don't recall fat vs thin LTO making much notable difference in size. I should add that I'm skimming through some old notes for those sizes.

# `Cargo.toml` sets `opt-level = "z"` and `lto = "thin"` (not much value in fat),
RUSTFLAGS="-C strip=symbols -C relocation-model=static" OPENSSL_NO_VENDOR=1 cargo build \
  --release \
  --no-default-features \
  --features "codecs-syslog,sources-stdin,sources-syslog,sinks-console" \
  --bin vector \
  --target x86_64-unknown-linux-musl
  • The OPENSSL_NO_VENDOR=1 isn't needed if you have the necessary packages to build from source. I had a frustrating time where this wasn't clear as builds were failing with a less helpful error output, turned out I needed the perl package. Opt-out of vendored feature for the openssl crate allowed me to use Alpine openssl-libs-static package, building on Alpine is 2-3x slower due to the memory allocator however.
  • I didn't see much difference for -gnu builds with static vs dynamic linking. Probably because I didn't have the package available, or perhaps I needed to more explicitly guide the linker? AFAIK with my minimal build the only external dep was openssl though.
  • Building with the nightly toolchain isn't worth it, often breaks requiring changes to Vector source and not always obvious how to resolve it. I don't recall it providing much notable gains (eg with -Z build-std), especially with the minimal build paired with UPX.

It'd be good to know what features are lightweight vs heavy, as I'd like to include a lightweight version of Vector for users to manage their logs with than the less pleasant logging setup an image I maintain has.


We do enable that via feature flags, but it's not well documented or easy for users to create their own distributions.

I've been meaning to contribute at some point a Dockerfile that's much simpler to adjust for a custom build with all deps, which might be helpful to some users than the more involved process the repo offers (much more complexity there to maintain / grok).

I remember hitting quite a few walls, some of it was unfamiliar, other parts making sense of what the repo build scripts were doing, looking at the Dockerfile files/scripts for releases, just to make sense of what was required to run a more familiar cargo build from a rust:latest / rust:alpine Docker image where there's less moving parts and I could tailor the release profile + cargo build command to my needs.

At the time official Vector release binaries were like 180MB uncompressed 😨


the vdev development tool includes a subcommand that runs Vector with only the feature flags required to run a given config file turned on (vdev run <CONFIG>).

That's pretty cool, cheers 👍

I modified it to output the feature list instead of running cargo build and that worked nicely! 😎

polarathene avatar Mar 15 '24 06:03 polarathene

👍 You can also try opt-level = "s" to have rustc optimize for size. Thanks for all of those other thoughts! Hopefully they will be useful to readers of this issue.

jszwedko avatar Mar 15 '24 21:03 jszwedko

You can also try opt-level = "s" to have rustc optimize for size

opt-level = "z" does the same, but which one does better varies based on config IIRC. I had tried various combinations with other profile settings a while back. The Cargo Profile docs hint at that too.

polarathene avatar Mar 15 '24 21:03 polarathene

Our internal build with just a few sinks and stuff is about 20 MB with LTO

@paolobarbolini, if you could share a tiny recipe about how you achieved that, it would certainly be helpful for me and others.

fungs avatar Mar 20 '24 21:03 fungs

What we did was patch Cargo.toml with

diff --git a/Cargo.toml b/Cargo.toml
index 78cd48b..cccfdf1 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -46,6 +46,9 @@ path = "tests/e2e/mod.rs"
 # compiled via the CI pipeline.
 [profile.release]
 debug = false # Do not include debug symbols in the executable.
+lto = true
+codegen-units = 1
+strip = true
 
 [profile.bench]
 debug = true

lto is the most important flag here because, if I remember correctly, at least at the time we started making our own builds of vector we noticed that many dependencies were being linked into the final executable despite them not being used by the features we had enabled.

Then we looked at Cargo.toml to see which features were enabled by default and in the build command we did:

cargo build --release --no-default-features --features COMMA_SEPARATED_LIST_OF_FEATURES

For example sinks enables all/most sinks, but you can just cherry-pick the ones you need from here. Same applies to sources and transforms.

Expect the build, especially the linking step at the end, to be very slow.

paolobarbolini avatar Mar 20 '24 22:03 paolobarbolini

To simplify the above, you can automatically get the features by running cargo vdev features CONFIG_FILE.yaml, which will parse the config and extract the features required to run it.

bruceg avatar Mar 20 '24 22:03 bruceg

if you could share a tiny recipe about how you achieved that

Looks like it's the same as what I shared above earlier: https://github.com/vectordotdev/vector/issues/20064#issuecomment-1999036235

Additional tips:

  • Add opt-level = "z" if performance is adequate and you'd prefer to bias minimal size. opt-level = "s" may sometimes be smaller, it varies.
  • lto = true will be slow to build, prefer lto = "thin" as that should be fairly similar but much faster AFAIK.
    • If the much longer build time is a non-issue, lto = true may have slight perf or size improvements, size benefit becomes marginal with a minimal feature set IIRC.
    • For lto = "thin", you'd probably want more codegen units (_default is 16, or with incremental = trueit is 256_). The defaultlto = false` at least cares about codegen units, and setting them to 1 would opt-out of LTO for that mode.
  • If you can shrug off the panic handler until hitting a problem and switching over to a build with it, you could also use panic = "abort"
  • Use the RUSTFLAGS setting -C relocation-model=static if you don't need the security benefit for dynamic location of memory allocations (at least that's what I recall this setting for). This can shave off a nice chunk (eg from 29MB to 26MB).
  • If your deployment environment is ok with compressed executables (AV software may raise a false-positive), you can often reduce the binary to 25% of the size via UPX. This can delay initial startup by 500ms to 1s in my testing, but should be a non-concern for Vector usually.

Expect the build, especially the linking step at the end, to be very slow.

Unfortunately while I was writing up an update to respond to this, my PC crashed and I lost a fair amount of good information :(

Rough recollection (my original write-up was much better formatted/detailed):

  • incremental = true in Cargo.toml can provide a decent speed up for repeated build steps that aren't LTO related.
    • The cargo build cache in the target/ directory is a bit dependent upon mtime attribute however, which a git checkout cannot restore. A similar issue may apply to restoring a remote cache (eg: in Github Actions CI), so I'm not sure how useful this feature is for build machines. Github Actions does have self-hosted runners which could retain the cache on your runner machine if that's an option.
    • On my small feature set build, this raised the target dir size from 1.5GB to 2GB, but no impact on binary release file size which is a win. Provided you have an explicit codegen-units = 16 (to match the implicit default), otherwise that setting is implicitly much larger and will produce larger binary builds.
  • .cargo/config.toml / RUSTFLAGS env to configure -C linker-plugin-lto (often used for cross-language LTO).
    • I originally shared an example config and documented this quite well with some additional insights, if someone is interested I can try write up something similar again.
    • The main perk for this with linker-plugin-lto was the ability to add an LTO cache directory, avoiding any redundant slow down for LTO when that work had already been done.
      • The configured linker (-C link-arg=-fuse-ld=mold for mold) affects the appropriate setting names since those vary by linker.
        • mold and lld are compatible IIRC, while ld is your default otherwise and has settings named differently.
        • Unlike without linker-plugin-lto these also impact the binary size as they are more involved in the LTO process, lld often was the best reduction with mold then ld.
    • lto setting could be "off" / false or "thin" and it would always be thin LTO, or you could do "fat" / true for full LTO.
      • With linker-plugin-lto enabled, the non-fat LTO setting in Cargo.toml produces the equivalent binary size regardless of the 3 choices, normally those affect the binary size differently as they either disable LTO ("off" or under certain conditions false), or perform thin LTO at a different scope (false vs "thin"),
    • When using -C linker=clang, the default LTO jobs is implicitly 0 which maps to 1 thread per physical core, thus only 50% CPU for -C linker-plugin-lto. This should be set to all (all threads) to match what Rust normally does for thin LTO, otherwise slight slow down in build time from reduced CPU usage.
    • You'll also need to set -C link-arg=-flto. While you can set -flto=thin / -flto=full, I think this only matters for non-rust code as it has no effect on the LTO job threads I observed when monitoring. The lto setting in Cargo.toml determines if it'll be thin or full LTO. Still this arg is required, at least when specifying the mold linker.

polarathene avatar Mar 21 '24 00:03 polarathene

UPX cuts the binary size to 1/3rd, 120MB ---> ~40MB. Seems like a quick and easy win.

-rwxr-xr-x  1 mperham  staff   43824808 Sep 11 08:40 vector
-rwxr-xr-x  1 mperham  staff  126125896 Sep 25 08:58 vector.orig

mperham avatar Sep 25 '24 16:09 mperham

Seems like a quick and easy win.

UPX is generally not used for releases IIRC due to false positives it is known to trigger with virus/malware scanners?

I've seen some projects still publish GH releases with UPX applied despite that, while others offer a -upx suffix as a variant.


For context, minimizing the size is usually of interest to certain environments like container images. Is that the case for you as well? Are you suggesting the GH releases are published with UPX for other use-cases, or would a Docker image published with a tag variant for UPX suffice? Or are you just chiming in with general feedback for anyone managing their own builds?

If the latter, I already shared my UPX size reductions earlier (noting about 75% size reduction on average, with a 500ms latency increase to startup), where I got a custom 16MB build down to 4.7MB (or 25MB for the original 100MB full vector build).

polarathene avatar Sep 25 '24 22:09 polarathene

@polarathene I've published UPX-compressed binaries for my commercial products for almost a decade now with zero complaints.

I do prefer full-featured static binaries over manual builds but the binaries like Vector can get big quick. UPX helps. I'm running it on a 1GB droplet.

mperham avatar Sep 25 '24 22:09 mperham

TL;DR:

  • I would still be cautious about UPX by default, it should be explicit if offered, ideally with a non UPX version when it's relevant.
  • In exchange for reduced binary size you get increased initial startup time but also memory usage (in this case a 27MB UPX full-featured Vector uses 200MB+ of RAM, that is 80MB+ more than the non-UPX Vector binary would use)

my commercial products

Off-topic: Yeah I recognize you for Sidekiq / Faktory 😎 (it was many years ago since we interacted, might not have been on Github, possibly Reddit)


You might have a point with the Vector demographic, I think the UPX concern was usually reported for software that was more desktop user oriented rather than servers.

My own personal opinion to use UPX depends on:

  • If the latency matters (a frequent command executed vs a daemon service).
  • If deploying to an environment where the filesystem has transparent compression, you can get similar gains without UPX (zstd level 3 compresses 127MB Vector musl 0.41.1 binary to 42MB, vs --ultra --long -22 to 31MB).

Vector - Minimal size considerations

I do prefer full-featured static binaries over manual builds but the binaries like Vector can get big quick. UPX helps.

IIRC there's a lot in the full-featured binary that you usually don't really need (unless you're bundling Vector into a deployment intended for others that's more generic), which as I've shown can bring down the size notably (even without UPX it was almost half of the size of the UPX compressed full-featured Vector binary).

I totally get the desire to bring the size down.

  • Perhaps the Vector team will be open to trying it in future, they could switch back if it did result in new issues complaining about it.
  • Or like other projects add assets to the release with -upx suffix? I am not a fan of UPX being used implicitly with binaries intended for general distribution/release.

That said, when Docker is relevant context you will find similar compression benefits with Docker as a distribution method. The official image for example has a 47MB compressed image at DockerHub that is 128MB uncompressed:

# This equates to roughly the same size you'd find on DockerHub:
$ docker save timberio/vector:nightly-distroless-static | gzip -c | wc -c | numfmt --to si
47M

# Local uncompressed size after a pull:
$ docker image ls | grep timberio
timberio/vector                         nightly-distroless-static   99516b32403d   21 hours ago    128MB

That has the same full-featured binary release copied into it, which is the bulk of the image weight.

So UPX in this scenario would only really benefit the local on-disk "uncompressed" image. You do need to weigh up how important it is to save that extra disk though, as mentioned it comes with tradeoffs.


UPX - Memory Impact

I'm running it on a 1GB droplet.

Great point! So this has nothing to do with disk capacity right, at least I'm assuming 1GB is the memory for the VPS.

I think we can both agree that generally memory is a more scarce resource than disk. This is another drawback you can have with UPX. The binary it decompresses doesn't come free here.

# A reproduction example for the screenshot that follows

$ docker run --rm -it --workdir /tmp --name example fedora:41
$ dnf install -y binutils btop upx

# Grab the latest release, extract the vector binary and config example
# NOTE: This could look much simpler if the GH release process was improved
$ curl -fsSL https://github.com/vectordotdev/vector/releases/download/v0.41.1/vector-0.41.1-x86_64-unknown-linux-musl.tar.gz \
  | tar -xz --strip-components=3 \
    ./vector-x86_64-unknown-linux-musl/bin/vector \
    ./vector-x86_64-unknown-linux-musl/config/vector.yaml

# Original size before minimizing it further:
$ du --si vector
127M    vector

# Strip symbols:
$ strip vector && du --si vector
107M    vector

# Create a UPX compressed variant:
$ upx --lzma -9 -o vector-upx vector && du --si vector-upx
28M     vector-upx

# `--quiet 1>/dev/null &` will run the command in the background and not output anything to the TTY stdout/stderr.
# You could alternatively use `docker exec -it --workdir /tmp example bash` on a separate TTY for each vector command.
$ ./vector --config vector.yaml --quiet 1>/dev/null &
$ ./vector-upx --config vector.yaml --quiet 1>/dev/null &
$ btop

With btop take a look at the memory usage between these two:

image

205 MiB vs 128 MiB - That is a 77 MiB delta (MiB because I forgot to switch settings from IEC default to SI units, so over 80MB difference).

You got the reduced disk storage cost win, but at the cost of using more than 1.5x as much memory to run Vector. Which is not ideal in your 1GB memory limited VPS environment? (there is also a related concern when a binary with UPX has multiple instances running vs without UPX)

That sort of tradeoff really should be a choice. Similar to how many users are quick to pounce on Alpine as a desirable base image when considering Docker, this tradeoff isn't as obvious at first like the advantage of reducing binary size.


I think Vector is right to publish without UPX for that reason, similar to their decision to not strip the binary completely.

It's easy to apply these changes, even if a little inconvenient, while doing the opposite is much more work for a user where these size optimizations aren't worth the tradeoff.

polarathene avatar Sep 26 '24 02:09 polarathene