Daft icon indicating copy to clipboard operation
Daft copied to clipboard

[PERF]: local json reader

Open universalmind303 opened this issue 1 year ago • 2 comments

closes https://github.com/Eventual-Inc/Daft/issues/2196

universalmind303 avatar May 09 '24 17:05 universalmind303

Some benchmarks using tpch scale 5 of "customer" table

Included polars to give a point of reference.

# polars with projection 
pl.scan_ndjson('./customer.json').select("c_mktsegment").collect()
# daft with projection
daft.read_json('./customer.json').select("c_mktsegment").collect()

# polars without projection 
pl.scan_ndjson('./customer.json').collect()
# daft without projection
daft.read_json('./customer.json').collect()



# polars (with projection)
# 76.8 ms ± 1.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# daft (with projection)
# 116 ms ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# daft main (with projection)
# 181 ms ± 5.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# polars (without projection)
# 89 ms ± 2.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# daft (without projection)
# 169 ms ± 2.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# daft main (without projection)
# 247 ms ± 6.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

universalmind303 avatar May 10 '24 22:05 universalmind303

assigning @clarkzinzow to take a look!

samster25 avatar May 14 '24 01:05 samster25