python-build-standalone
python-build-standalone copied to clipboard
Provide zstd compressed install only release artifacts?
This project's release artifacts are continuing to gain popularity.
I initially only published the zstd compressed full archives for use with PyOxidizer. Then when people discovered the utility of the distributions and they wanted smaller downloadable artifacts, we made the install_only archive variants. I chose .tar.gz at the time because of the ubiquity of zlib and knew I couldn't get away with zstd only.
We still need to provide gzip archives for compatibility I suspect. But I'm wondering if we should provide zstd compressed archives so customers could speed up decompression by a few seconds. This could matter for things like GitHub Actions. Every second can count!
WDYK @charliermarsh? Would uv benefit from the speedup from zstd archives?
Actually, the time savings may be <1.0 now with the stripped distributions. The size of the debug symbols and raw object files made zstd a very obvious advantage with the full archives. But maybe gzip with stripped is small enough the >5x slower decompression doesn't translate to a meaningful wall time difference?
We could try it out and benchmark it in uv? We seamlessly support gzip and zstd already, so we’d just need to generate the assets.
For reasons that I don't fully understand, using zstd here appears to be slower? Even with a basic local benchmark:
Starting extraction benchmarks...
Decompressing Gzip: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.gz
Extracting to: /var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmphWYnXY
Gzip extraction complete in 308.071959ms
------------------------
Decompressing Zstandard: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.zst
Extracting to: /var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmpZsHsHK
Zstandard extraction complete in 407.046833ms
use async_compression::tokio::bufread::{GzipDecoder, ZstdDecoder};
use tokio::fs::File;
use tokio::io::{self, AsyncReadExt, BufReader};
use std::time::Instant;
use tempfile::tempdir;
use tokio_tar::Archive;
use std::fs::File as SyncFile;
use std::io::{BufReader as SyncBufReader};
use flate2::read::GzDecoder;
use zstd::stream::Decoder as ZstdSyncDecoder;
use tar::Archive as SyncArchive;
async fn decompress_gzip(file_path: &str, use_sync: bool) -> io::Result<()> {
println!("Decompressing Gzip: {}", file_path);
// Create temporary directory
let temp_dir = tempdir()?;
println!("Extracting to: {}", temp_dir.path().display());
let start = Instant::now();
if use_sync {
// Synchronous implementation
let file = SyncFile::open(file_path)?;
let buf_reader = SyncBufReader::new(file);
let decoder = GzDecoder::new(buf_reader);
let mut archive = SyncArchive::new(decoder);
archive.unpack(temp_dir.path())?;
} else {
// Asynchronous implementation
let file = File::open(file_path).await?;
let buf_reader = BufReader::new(file);
let decoder = GzipDecoder::new(buf_reader);
let mut archive = Archive::new(decoder);
archive.unpack(temp_dir.path()).await?;
}
let duration = start.elapsed();
println!(
"Gzip extraction complete in {:?}",
duration
);
Ok(())
}
async fn decompress_zstd(file_path: &str, use_sync: bool) -> io::Result<()> {
println!("Decompressing Zstandard: {}", file_path);
// Create temporary directory
let temp_dir = tempdir()?;
println!("Extracting to: {}", temp_dir.path().display());
let start = Instant::now();
if use_sync {
// Synchronous implementation
let file = SyncFile::open(file_path)?;
let buf_reader = SyncBufReader::new(file);
let decoder = ZstdSyncDecoder::new(buf_reader)?;
let mut archive = SyncArchive::new(decoder);
archive.unpack(temp_dir.path())?;
} else {
// Asynchronous implementation
let file = File::open(file_path).await?;
let buf_reader = BufReader::new(file);
let decoder = ZstdDecoder::new(buf_reader);
let mut archive = Archive::new(decoder);
archive.unpack(temp_dir.path()).await?;
}
let duration = start.elapsed();
println!(
"Zstandard extraction complete in {:?}",
duration
);
Ok(())
}
#[tokio::main]
async fn main() -> std::io::Result<()> {
let gzip_file = "cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.gz"; // Path to the .tar.gz file
let zstd_file = "cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.zst"; // Path to the .tar.zst file
let use_sync = true; // Set to true to use synchronous implementation
println!("Starting extraction benchmarks...\n");
if let Err(e) = decompress_gzip(gzip_file, use_sync).await {
eprintln!("Failed to extract Gzip file: {}", e);
}
println!("\n------------------------\n");
if let Err(e) = decompress_zstd(zstd_file, use_sync).await {
eprintln!("Failed to extract Zstandard file: {}", e);
}
Ok(())
}
If I don't "unpack", though, it's much faster:
Decompressing Gzip: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.gz
Gzip extraction complete in 122.582292ms
------------------------
Decompressing Zstandard: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.zst
Zstandard extraction complete in 36.125333ms
I did a local benchmark with cpython-3.12.9-20250311-x86_64-unknown-linux-gnu-install_only_stripped before finding this issue:
$ hyperfine --prepare "rm -rf foo && mkdir foo" "tar -xf python.tar.gz -C foo" "tar -xf python.tar.zst -C foo"
Benchmark 1: tar -xf python.tar.gz -C foo
Time (mean ± σ): 267.1 ms ± 2.3 ms [User: 251.1 ms, System: 64.5 ms]
Range (min … max): 264.6 ms … 271.6 ms 10 runs
Benchmark 2: tar -xf python.tar.zst -C foo
Time (mean ± σ): 76.1 ms ± 2.8 ms [User: 52.1 ms, System: 74.9 ms]
Range (min … max): 72.2 ms … 83.6 ms 29 runs
Summary
tar -xf python.tar.zst -C foo ran
3.51 ± 0.13 times faster than tar -xf python.tar.gz -C foo
In a preliminary benchmark with local caching of downloaded pbs archives, where uv-1 is using .tar.gz and uv-2 is using a manually converted .tar.zstd, this speedup seems to translate to rust, too, though less pronounced:
$ hyperfine --prepare "uv python uninstall 3.12" "./uv-1 python install --cache-dir b 3.12.9" "./uv-2 python install --cache-dir a 3.12.9"
Benchmark 1: ./uv-1 python install --cache-dir b 3.12.9
Time (mean ± σ): 226.8 ms ± 5.4 ms [User: 136.9 ms, System: 92.6 ms]
Range (min … max): 220.5 ms … 238.6 ms 11 runs
Benchmark 2: ./uv-2 python install --cache-dir a 3.12.9
Time (mean ± σ): 183.0 ms ± 10.1 ms [User: 91.0 ms, System: 95.0 ms]
Range (min … max): 171.9 ms … 206.7 ms 13 runs
Summary
uv-profiling python install --cache-dir a 3.12.9 ran
1.24 ± 0.07 times faster than ./uv-1 python install --cache-dir b 3.12.9
(the rust benchmark isn't very rigorous, but it matches the expectation and experience in other projects that zstandard is faster and smaller than gzip.)
That's meaningful and crosses the "worthwhile" threshold for me. I wonder why my benchmarks didn't show a meaningful difference?
For the manually created zstd variant, did you create a new tar or was the inner tar archive reused? (Ordering of files within the tar can affect filesystem performance. PBS has custom tar creation code to make tar deterministic. CLI tar command will order by filesystem inode, which is random.)
Also, perf testing on laptops and desktop systems is notoriously challenging due to turbo boosting and thermal throttling. It's really important to use a tool like hyperfine to capture several runs to weed out variance. And even that often isn't enough to isolate hardware variables.
I had recompressed the archive with
zstd -c -d < 7e9dbabb3-cpython-3.12.9-20250311-x86_64-unknown-linux-gnu-install_only_stripped.tar.gz | zstd > 7e9dbabb3-cpython-3.12.9-20250311-x86_64-unknown-linux-gnu-install_only_stripped.tar.zst
Here is a more quiet benchmark. The CPU governor surprisingly doesn't make a difference for this one.
$ taskset -c 0 hyperfine --prepare "uv python uninstall 3.12.9" "UV_ZSTD_HACK=0 uv-profiling python install --cache-dir cache 3.12.9" "UV_ZSTD_HACK=1 uv-profiling python install --cache-dir cache 3.12.9"
Benchmark 1: UV_ZSTD_HACK=0 uv-profiling python install --cache-dir cache 3.12.9
Time (mean ± σ): 232.5 ms ± 1.3 ms [User: 138.6 ms, System: 77.3 ms]
Range (min … max): 231.4 ms … 235.7 ms 11 runs
Benchmark 2: UV_ZSTD_HACK=1 uv-profiling python install --cache-dir cache 3.12.9
Time (mean ± σ): 180.2 ms ± 1.7 ms [User: 86.2 ms, System: 77.4 ms]
Range (min … max): 178.3 ms … 185.8 ms 14 runs
Summary
UV_ZSTD_HACK=1 uv-profiling python install --cache-dir cache 3.12.9 ran
1.29 ± 0.01 times faster than UV_ZSTD_HACK=0 uv-profiling python install --cache-dir cache 3.12.9
This is running just the unpacking (no download) on a WIP branch.
Profiling isn't too insightful.
I'm not clear why we're so much slower even with sync code (below) than a trivial tar command: 170ms unpacking with uv vs ~80ms with hyperfine --prepare "rm -rf a && mkdir a" "tar -xf 7e9dbabb3-cpython-3.12.9-20250311-x86_64-unknown-linux-gnu-install_only_stripped.tar.zst -C a".
I've done some more benchmarking and it's unclear to me why the performance is so different between rust and tar:
use flate2::read::GzDecoder;
use std::fs::File;
use std::io::BufReader;
use std::path::PathBuf;
use std::{env, fs};
fn main() {
let file = PathBuf::from(env::args().skip(1).next().unwrap());
match file.extension().unwrap().to_str().unwrap() {
"gz" => {
let reader = GzDecoder::new(BufReader::new(File::open(file).unwrap()));
fs::create_dir_all("unpacked").unwrap();
tar::Archive::new(reader).unpack("unpacked").unwrap();
}
"zst" => {
let reader =
zstd::Decoder::with_buffer(BufReader::new(File::open(file).unwrap())).unwrap();
fs::create_dir_all("unpacked").unwrap();
tar::Archive::new(reader).unpack("unpacked").unwrap();
}
unknown => panic!("Unknown file type: {}", unknown),
}
}
[package]
name = "scratch-rust"
version = "0.1.0"
edition = "2024"
[dependencies]
flate2 = { version = "1.1.0", features = ["zlib-ng"] }
tar = "0.4.44"
zstd = "0.13.3"
#!/bin/bash
wget "https://github.com/astral-sh/python-build-standalone/releases/download/20250311/cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz"
zstd -c -d < "cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz" > "cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.zst"
rm -rf unpacked && mkdir unpacked && tar -C unpacked -xf cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz
tar --zstd -cf python.tar.zst -C unpacked .
#!/bin/bash
cargo build --release
hyperfine --warmup 2 --prepare "rm -rf unpacked && mkdir unpacked" \
"target/release/scratch-rust cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz" \
"tar -C unpacked -xf cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz" \
hyperfine --warmup 2 --prepare "rm -rf unpacked && mkdir unpacked" \
"target/release/scratch-rust python.tar.zst" \
"tar -C unpacked -xf python.tar.zst"
I can't use the immediately repacked archive because I'm hitting (I think) https://github.com/gyscos/zstd-rs/pull/251
Benchmark 1: target/release/scratch-rust cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz
Time (mean ± σ): 192.8 ms ± 3.1 ms [User: 125.1 ms, System: 66.8 ms]
Range (min … max): 188.7 ms … 201.7 ms 13 runs
Benchmark 2: tar -C unpacked -xf cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz
Time (mean ± σ): 304.4 ms ± 3.0 ms [User: 285.1 ms, System: 70.4 ms]
Range (min … max): 302.0 ms … 312.1 ms 10 runs
Summary
target/release/scratch-rust cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz ran
1.58 ± 0.03 times faster than tar -C unpacked -xf cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz
Benchmark 1: target/release/scratch-rust python.tar.zst
Time (mean ± σ): 129.3 ms ± 1.8 ms [User: 59.7 ms, System: 69.4 ms]
Range (min … max): 127.5 ms … 134.8 ms 19 runs
Benchmark 2: tar -C unpacked -xf python.tar.zst
Time (mean ± σ): 87.1 ms ± 3.4 ms [User: 63.7 ms, System: 78.8 ms]
Range (min … max): 83.0 ms … 96.5 ms 26 runs
Summary
tar -C unpacked -xf python.tar.zst ran
1.48 ± 0.06 times faster than target/release/scratch-rust python.tar.zst