python-build-standalone icon indicating copy to clipboard operation
python-build-standalone copied to clipboard

Provide zstd compressed install only release artifacts?

Open indygreg opened this issue 1 year ago • 4 comments

This project's release artifacts are continuing to gain popularity.

I initially only published the zstd compressed full archives for use with PyOxidizer. Then when people discovered the utility of the distributions and they wanted smaller downloadable artifacts, we made the install_only archive variants. I chose .tar.gz at the time because of the ubiquity of zlib and knew I couldn't get away with zstd only.

We still need to provide gzip archives for compatibility I suspect. But I'm wondering if we should provide zstd compressed archives so customers could speed up decompression by a few seconds. This could matter for things like GitHub Actions. Every second can count!

WDYK @charliermarsh? Would uv benefit from the speedup from zstd archives?

indygreg avatar Aug 25 '24 20:08 indygreg

Actually, the time savings may be <1.0 now with the stripped distributions. The size of the debug symbols and raw object files made zstd a very obvious advantage with the full archives. But maybe gzip with stripped is small enough the >5x slower decompression doesn't translate to a meaningful wall time difference?

indygreg avatar Aug 25 '24 20:08 indygreg

We could try it out and benchmark it in uv? We seamlessly support gzip and zstd already, so we’d just need to generate the assets.

charliermarsh avatar Aug 25 '24 22:08 charliermarsh

For reasons that I don't fully understand, using zstd here appears to be slower? Even with a basic local benchmark:

Starting extraction benchmarks...

Decompressing Gzip: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.gz
Extracting to: /var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmphWYnXY
Gzip extraction complete in 308.071959ms

------------------------

Decompressing Zstandard: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.zst
Extracting to: /var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmpZsHsHK
Zstandard extraction complete in 407.046833ms
use async_compression::tokio::bufread::{GzipDecoder, ZstdDecoder};
use tokio::fs::File;
use tokio::io::{self, AsyncReadExt, BufReader};
use std::time::Instant;
use tempfile::tempdir;
use tokio_tar::Archive;
use std::fs::File as SyncFile;
use std::io::{BufReader as SyncBufReader};
use flate2::read::GzDecoder;
use zstd::stream::Decoder as ZstdSyncDecoder;
use tar::Archive as SyncArchive;

async fn decompress_gzip(file_path: &str, use_sync: bool) -> io::Result<()> {
    println!("Decompressing Gzip: {}", file_path);

    // Create temporary directory
    let temp_dir = tempdir()?;
    println!("Extracting to: {}", temp_dir.path().display());

    let start = Instant::now();

    if use_sync {
        // Synchronous implementation
        let file = SyncFile::open(file_path)?;
        let buf_reader = SyncBufReader::new(file);
        let decoder = GzDecoder::new(buf_reader);
        let mut archive = SyncArchive::new(decoder);
        archive.unpack(temp_dir.path())?;
    } else {
        // Asynchronous implementation
        let file = File::open(file_path).await?;
        let buf_reader = BufReader::new(file);
        let decoder = GzipDecoder::new(buf_reader);
        let mut archive = Archive::new(decoder);
        archive.unpack(temp_dir.path()).await?;
    }

    let duration = start.elapsed();
    println!(
        "Gzip extraction complete in {:?}",
        duration
    );
    Ok(())
}

async fn decompress_zstd(file_path: &str, use_sync: bool) -> io::Result<()> {
    println!("Decompressing Zstandard: {}", file_path);

    // Create temporary directory
    let temp_dir = tempdir()?;
    println!("Extracting to: {}", temp_dir.path().display());

    let start = Instant::now();

    if use_sync {
        // Synchronous implementation
        let file = SyncFile::open(file_path)?;
        let buf_reader = SyncBufReader::new(file);
        let decoder = ZstdSyncDecoder::new(buf_reader)?;
        let mut archive = SyncArchive::new(decoder);
        archive.unpack(temp_dir.path())?;
    } else {
        // Asynchronous implementation
        let file = File::open(file_path).await?;
        let buf_reader = BufReader::new(file);
        let decoder = ZstdDecoder::new(buf_reader);
        let mut archive = Archive::new(decoder);
        archive.unpack(temp_dir.path()).await?;
    }

    let duration = start.elapsed();
    println!(
        "Zstandard extraction complete in {:?}",
        duration
    );
    Ok(())
}

#[tokio::main]
async fn main() -> std::io::Result<()> {
    let gzip_file = "cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.gz"; // Path to the .tar.gz file
    let zstd_file = "cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.zst"; // Path to the .tar.zst file
    let use_sync = true; // Set to true to use synchronous implementation

    println!("Starting extraction benchmarks...\n");

    if let Err(e) = decompress_gzip(gzip_file, use_sync).await {
        eprintln!("Failed to extract Gzip file: {}", e);
    }

    println!("\n------------------------\n");

    if let Err(e) = decompress_zstd(zstd_file, use_sync).await {
        eprintln!("Failed to extract Zstandard file: {}", e);
    }

    Ok(())
}

charliermarsh avatar Dec 18 '24 02:12 charliermarsh

If I don't "unpack", though, it's much faster:

Decompressing Gzip: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.gz
Gzip extraction complete in 122.582292ms

------------------------

Decompressing Zstandard: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.zst
Zstandard extraction complete in 36.125333ms

charliermarsh avatar Dec 18 '24 02:12 charliermarsh

I did a local benchmark with cpython-3.12.9-20250311-x86_64-unknown-linux-gnu-install_only_stripped before finding this issue:

$ hyperfine --prepare "rm -rf foo && mkdir foo" "tar -xf python.tar.gz -C foo" "tar -xf python.tar.zst -C foo"
Benchmark 1: tar -xf python.tar.gz -C foo
  Time (mean ± σ):     267.1 ms ±   2.3 ms    [User: 251.1 ms, System: 64.5 ms]
  Range (min … max):   264.6 ms … 271.6 ms    10 runs
 
Benchmark 2: tar -xf python.tar.zst -C foo
  Time (mean ± σ):      76.1 ms ±   2.8 ms    [User: 52.1 ms, System: 74.9 ms]
  Range (min … max):    72.2 ms …  83.6 ms    29 runs
 
Summary
  tar -xf python.tar.zst -C foo ran
    3.51 ± 0.13 times faster than tar -xf python.tar.gz -C foo

In a preliminary benchmark with local caching of downloaded pbs archives, where uv-1 is using .tar.gz and uv-2 is using a manually converted .tar.zstd, this speedup seems to translate to rust, too, though less pronounced:

$ hyperfine --prepare "uv python uninstall 3.12" "./uv-1 python install --cache-dir b 3.12.9" "./uv-2 python install --cache-dir a 3.12.9"
Benchmark 1: ./uv-1 python install --cache-dir b 3.12.9
  Time (mean ± σ):     226.8 ms ±   5.4 ms    [User: 136.9 ms, System: 92.6 ms]
  Range (min … max):   220.5 ms … 238.6 ms    11 runs
 
Benchmark 2: ./uv-2 python install --cache-dir a 3.12.9
  Time (mean ± σ):     183.0 ms ±  10.1 ms    [User: 91.0 ms, System: 95.0 ms]
  Range (min … max):   171.9 ms … 206.7 ms    13 runs
 
Summary
  uv-profiling python install --cache-dir a 3.12.9 ran
    1.24 ± 0.07 times faster than ./uv-1 python install --cache-dir b 3.12.9

(the rust benchmark isn't very rigorous, but it matches the expectation and experience in other projects that zstandard is faster and smaller than gzip.)

konstin avatar Mar 13 '25 16:03 konstin

That's meaningful and crosses the "worthwhile" threshold for me. I wonder why my benchmarks didn't show a meaningful difference?

charliermarsh avatar Mar 13 '25 16:03 charliermarsh

For the manually created zstd variant, did you create a new tar or was the inner tar archive reused? (Ordering of files within the tar can affect filesystem performance. PBS has custom tar creation code to make tar deterministic. CLI tar command will order by filesystem inode, which is random.)

Also, perf testing on laptops and desktop systems is notoriously challenging due to turbo boosting and thermal throttling. It's really important to use a tool like hyperfine to capture several runs to weed out variance. And even that often isn't enough to isolate hardware variables.

indygreg avatar Mar 13 '25 17:03 indygreg

I had recompressed the archive with

zstd -c -d < 7e9dbabb3-cpython-3.12.9-20250311-x86_64-unknown-linux-gnu-install_only_stripped.tar.gz | zstd > 7e9dbabb3-cpython-3.12.9-20250311-x86_64-unknown-linux-gnu-install_only_stripped.tar.zst

Here is a more quiet benchmark. The CPU governor surprisingly doesn't make a difference for this one.

$ taskset -c 0 hyperfine --prepare "uv python uninstall 3.12.9" "UV_ZSTD_HACK=0 uv-profiling python install --cache-dir cache 3.12.9"  "UV_ZSTD_HACK=1 uv-profiling python install --cache-dir cache 3.12.9" 
Benchmark 1: UV_ZSTD_HACK=0 uv-profiling python install --cache-dir cache 3.12.9
  Time (mean ± σ):     232.5 ms ±   1.3 ms    [User: 138.6 ms, System: 77.3 ms]
  Range (min … max):   231.4 ms … 235.7 ms    11 runs
 
Benchmark 2: UV_ZSTD_HACK=1 uv-profiling python install --cache-dir cache 3.12.9
  Time (mean ± σ):     180.2 ms ±   1.7 ms    [User: 86.2 ms, System: 77.4 ms]
  Range (min … max):   178.3 ms … 185.8 ms    14 runs
 
Summary
  UV_ZSTD_HACK=1 uv-profiling python install --cache-dir cache 3.12.9 ran
    1.29 ± 0.01 times faster than UV_ZSTD_HACK=0 uv-profiling python install --cache-dir cache 3.12.9

This is running just the unpacking (no download) on a WIP branch.

Profiling isn't too insightful.

Image

Image

I'm not clear why we're so much slower even with sync code (below) than a trivial tar command: 170ms unpacking with uv vs ~80ms with hyperfine --prepare "rm -rf a && mkdir a" "tar -xf 7e9dbabb3-cpython-3.12.9-20250311-x86_64-unknown-linux-gnu-install_only_stripped.tar.zst -C a".

Image

konstin avatar Mar 13 '25 18:03 konstin

I've done some more benchmarking and it's unclear to me why the performance is so different between rust and tar:

use flate2::read::GzDecoder;
use std::fs::File;
use std::io::BufReader;
use std::path::PathBuf;
use std::{env, fs};

fn main() {
    let file = PathBuf::from(env::args().skip(1).next().unwrap());
    match file.extension().unwrap().to_str().unwrap() {
        "gz" => {
            let reader = GzDecoder::new(BufReader::new(File::open(file).unwrap()));
            fs::create_dir_all("unpacked").unwrap();
            tar::Archive::new(reader).unpack("unpacked").unwrap();
        }
        "zst" => {
            let reader =
                zstd::Decoder::with_buffer(BufReader::new(File::open(file).unwrap())).unwrap();
            fs::create_dir_all("unpacked").unwrap();
            tar::Archive::new(reader).unpack("unpacked").unwrap();
        }
        unknown => panic!("Unknown file type: {}", unknown),
    }
}
[package]
name = "scratch-rust"
version = "0.1.0"
edition = "2024"

[dependencies]
flate2 = { version = "1.1.0", features = ["zlib-ng"] }
tar = "0.4.44"
zstd = "0.13.3"
#!/bin/bash

wget "https://github.com/astral-sh/python-build-standalone/releases/download/20250311/cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz"
zstd -c -d < "cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz" > "cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.zst"
rm -rf unpacked && mkdir unpacked && tar -C unpacked -xf cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz
tar --zstd -cf python.tar.zst -C unpacked .
#!/bin/bash

cargo build --release

hyperfine --warmup 2 --prepare "rm -rf unpacked && mkdir unpacked" \
  "target/release/scratch-rust cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz" \
  "tar -C unpacked -xf cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz" \

hyperfine --warmup 2 --prepare "rm -rf unpacked && mkdir unpacked" \
  "target/release/scratch-rust python.tar.zst" \
  "tar -C unpacked -xf python.tar.zst"

I can't use the immediately repacked archive because I'm hitting (I think) https://github.com/gyscos/zstd-rs/pull/251

Benchmark 1: target/release/scratch-rust cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz
  Time (mean ± σ):     192.8 ms ±   3.1 ms    [User: 125.1 ms, System: 66.8 ms]
  Range (min … max):   188.7 ms … 201.7 ms    13 runs
 
Benchmark 2: tar -C unpacked -xf cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz
  Time (mean ± σ):     304.4 ms ±   3.0 ms    [User: 285.1 ms, System: 70.4 ms]
  Range (min … max):   302.0 ms … 312.1 ms    10 runs
 
Summary
  target/release/scratch-rust cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz ran
    1.58 ± 0.03 times faster than tar -C unpacked -xf cpython-3.12.9+20250311-aarch64-unknown-linux-gnu-install_only.tar.gz
Benchmark 1: target/release/scratch-rust python.tar.zst
  Time (mean ± σ):     129.3 ms ±   1.8 ms    [User: 59.7 ms, System: 69.4 ms]
  Range (min … max):   127.5 ms … 134.8 ms    19 runs
 
Benchmark 2: tar -C unpacked -xf python.tar.zst
  Time (mean ± σ):      87.1 ms ±   3.4 ms    [User: 63.7 ms, System: 78.8 ms]
  Range (min … max):    83.0 ms …  96.5 ms    26 runs
 
Summary
  tar -C unpacked -xf python.tar.zst ran
    1.48 ± 0.06 times faster than target/release/scratch-rust python.tar.zst

konstin avatar Mar 14 '25 12:03 konstin