tpch icon indicating copy to clipboard operation
tpch copied to clipboard

Results 9 tpch issues
Sort by recently updated
recently updated
newest added

See https://github.com/coiled/benchmarks/issues/1515 Q4, Q18, and Q20 contain the `EXISTS` keyword, which should be translated to a semi-join. This should speed up the queries.

Hi, I noticed that the generated Parquet files are extremely fragmented in terms of rowgroups. This likely indicates a bug/issue in the Polars Parquet writer, but definitely also affects the...

It would be great to have a comparison to datafusion as well!

Polars can run them for sure. Do you want a contribution?

Here's the line: https://github.com/pola-rs/tpch/blob/6c5bbe93a04cfcd25678dd860bab5ad61ad66edb/queries/pyspark/utils.py#L24 If these benchmarks are being run on a single node, we should probably set the shuffle partitions to be like 1-4 instead of 200 (which is...

Some results: https://www.kaggle.com/code/marcogorelli/m5-forecasting-feature-engineering-benchmark

The [M5 Forecasting Competition](https://www.sciencedirect.com/science/article/pii/S0169207021001874) was held on Kaggle in 2020, and top solutions generally featured a lot of heavy feature engineering Doing that feature engineering in pandas was quite slow,...