quantile-compression icon indicating copy to clipboard operation
quantile-compression copied to clipboard

Testcase optimization

Open rachtsingh opened this issue 6 months ago • 2 comments

No worries if this isn't the right forum, but I was thinking about adopting pcodec for compression of some of our timeseries data, and ran into a test case where zstd seems to perform better. Do you think this a usage issue (e.g. I need different options) or just that this data is not a good fit for pcodec?

Here's the output from pcodec bench:

$ pcodec bench --input pcodec_testcase.parquet --codecs pco:level=9,parquet:compression=zstd4
╭──────────┬───────────────────────────┬─────────────┬───────────────┬─────────────────╮
│ dataset  │ codec                     │ compress_dt │ decompress_dt │ compressed_size │
├──────────┼───────────────────────────┼─────────────┼───────────────┼─────────────────┤
│ i32_hour │ pco:level=9               │  3.047264ms │     434.267µs │              92 │
│ i32_hour │ parquet:compression=zstd4 │  3.228085ms │      37.575µs │             459 │
│ i64_idx  │ pco:level=9               │  2.489832ms │     723.768µs │             127 │
│ i64_idx  │ parquet:compression=zstd4 │  4.459501ms │     406.052µs │           87016 │
│ f64_y    │ pco:level=9               │ 10.862102ms │    1.778143ms │         1198460 │
│ f64_y    │ parquet:compression=zstd4 │ 14.918371ms │    1.196446ms │          770857 │
│ <sum>    │ pco:level=9               │ 16.399198ms │    2.936178ms │         1198679 │
│ <sum>    │ parquet:compression=zstd4 │ 22.605957ms │    1.640073ms │          858332 │
│ <sum>    │ <sum>                     │ 39.005155ms │    4.576251ms │         2057011 │
╰──────────┴───────────────────────────┴─────────────┴───────────────┴─────────────────╯

The thing we care about is the 770k column in zstd going to ~1.2mb.

Thanks!

rachtsingh avatar Jul 26 '24 22:07 rachtsingh