quantile-compression
quantile-compression copied to clipboard
Testcase optimization
No worries if this isn't the right forum, but I was thinking about adopting pcodec for compression of some of our timeseries data, and ran into a test case where zstd seems to perform better. Do you think this a usage issue (e.g. I need different options) or just that this data is not a good fit for pcodec?
Here's the output from pcodec bench
:
$ pcodec bench --input pcodec_testcase.parquet --codecs pco:level=9,parquet:compression=zstd4
╭──────────┬───────────────────────────┬─────────────┬───────────────┬─────────────────╮
│ dataset │ codec │ compress_dt │ decompress_dt │ compressed_size │
├──────────┼───────────────────────────┼─────────────┼───────────────┼─────────────────┤
│ i32_hour │ pco:level=9 │ 3.047264ms │ 434.267µs │ 92 │
│ i32_hour │ parquet:compression=zstd4 │ 3.228085ms │ 37.575µs │ 459 │
│ i64_idx │ pco:level=9 │ 2.489832ms │ 723.768µs │ 127 │
│ i64_idx │ parquet:compression=zstd4 │ 4.459501ms │ 406.052µs │ 87016 │
│ f64_y │ pco:level=9 │ 10.862102ms │ 1.778143ms │ 1198460 │
│ f64_y │ parquet:compression=zstd4 │ 14.918371ms │ 1.196446ms │ 770857 │
│ <sum> │ pco:level=9 │ 16.399198ms │ 2.936178ms │ 1198679 │
│ <sum> │ parquet:compression=zstd4 │ 22.605957ms │ 1.640073ms │ 858332 │
│ <sum> │ <sum> │ 39.005155ms │ 4.576251ms │ 2057011 │
╰──────────┴───────────────────────────┴─────────────┴───────────────┴─────────────────╯
The thing we care about is the 770k column in zstd going to ~1.2mb.
Thanks!