zed icon indicating copy to clipboard operation
zed copied to clipboard

count() query of data in pool 4x slower than count() of same data via zq

Open mattnibs opened this issue 4 years ago • 0 comments

(When this issue was first opened on behalf of @philrz, it was incorrectly framed as being a perf problem specific to when the Zed service is run in the cloud, but in fact the perf issue is entirely reproducible when everything is run on the same local host. @philrz has now morphed the issue to represent that & re-repro'ed with current Zed commit 9c0f097.)


Doing a count() on a large data set is consistently slower when run against a pool stored in a Zed lake when compared when queried in its file form via zq.

Here's a repro with a ~3 GB, gzip'ed ZNG file, which is ~4.3 GB after being gunzip'ed.

$ ls -l fdns-a.zng.gz
-rw-r--r--@ 1 phil  staff  2995963454 Jun 28 14:46 fdns-a.zng.gz

$ gzcat fdns-a.zng.gz | wc -c
 4641817018

$ zed -version
Version: v1.2.0-49-g9c0f0973

$ curl http://localhost:9867/version
{"version":"v1.2.0-49-g9c0f0973"}

$ zed create -orderby value:asc fdns
pool created: fdns 2EjlcsW7X6NOV8M1ldIJWAdjecd

$ zed use fdns
Switched to branch "main" on pool "fdns"

$ zed load fdns-a.zng.gz
(1/1) 3.00GB/3.00GB 0B/s 100.00%
2EjnTG8vKFsdi3nppDh2AwpjfmT committed

$ time zed query -z 'count()'
{count:342541391(uint64)}

real	3m42.805s
user	0m0.079s
sys	0m0.036s

$ time zed query -z 'count()'
{count:342541391(uint64)}

real	3m41.223s
user	0m0.076s
sys	0m0.034s

$ zq -version
Version: v1.2.0-49-g9c0f0973

$ time zq -z 'count()' fdns-a.zng.gz
{count:342541391(uint64)}

real	0m48.988s
user	1m12.660s
sys	0m2.189s

$ time zq -z 'count()' fdns-a.zng.gz
{count:342541391(uint64)}

real	0m51.672s
user	1m16.811s
sys	0m2.234s

mattnibs avatar Jan 11 '22 17:01 mattnibs