zed
zed copied to clipboard
count() query of data in pool 4x slower than count() of same data via zq
(When this issue was first opened on behalf of @philrz, it was incorrectly framed as being a perf problem specific to when the Zed service is run in the cloud, but in fact the perf issue is entirely reproducible when everything is run on the same local host. @philrz has now morphed the issue to represent that & re-repro'ed with current Zed commit 9c0f097.)
Doing a count() on a large data set is consistently slower when run against a pool stored in a Zed lake when compared when queried in its file form via zq.
Here's a repro with a ~3 GB, gzip'ed ZNG file, which is ~4.3 GB after being gunzip'ed.
$ ls -l fdns-a.zng.gz
-rw-r--r--@ 1 phil staff 2995963454 Jun 28 14:46 fdns-a.zng.gz
$ gzcat fdns-a.zng.gz | wc -c
4641817018
$ zed -version
Version: v1.2.0-49-g9c0f0973
$ curl http://localhost:9867/version
{"version":"v1.2.0-49-g9c0f0973"}
$ zed create -orderby value:asc fdns
pool created: fdns 2EjlcsW7X6NOV8M1ldIJWAdjecd
$ zed use fdns
Switched to branch "main" on pool "fdns"
$ zed load fdns-a.zng.gz
(1/1) 3.00GB/3.00GB 0B/s 100.00%
2EjnTG8vKFsdi3nppDh2AwpjfmT committed
$ time zed query -z 'count()'
{count:342541391(uint64)}
real 3m42.805s
user 0m0.079s
sys 0m0.036s
$ time zed query -z 'count()'
{count:342541391(uint64)}
real 3m41.223s
user 0m0.076s
sys 0m0.034s
$ zq -version
Version: v1.2.0-49-g9c0f0973
$ time zq -z 'count()' fdns-a.zng.gz
{count:342541391(uint64)}
real 0m48.988s
user 1m12.660s
sys 0m2.189s
$ time zq -z 'count()' fdns-a.zng.gz
{count:342541391(uint64)}
real 0m51.672s
user 1m16.811s
sys 0m2.234s