tpch icon indicating copy to clipboard operation
tpch copied to clipboard

The rest of the queries?

Open marsupialtail opened this issue 2 years ago • 5 comments

Polars can run them for sure. Do you want a contribution?

marsupialtail avatar Mar 08 '23 19:03 marsupialtail

That would be great!

ritchie46 avatar Mar 08 '23 19:03 ritchie46

@ritchie46 I have started but ran into a problem. Here is how I wrote query 13:

ref_customer = polars.read_csv("/home/ziheng/tpc-h/customer.tbl", sep="|")
ref_orders = polars.read_csv("/home/ziheng/tpc-h/orders.tbl", sep="|").\
    filter( ~(polars.col("o_comment").str.contains('special') & polars.col("o_comment").str.contains('requests')))
ref = ref_customer.join(ref_orders, left_on="c_custkey", right_on="o_custkey", how="left")\
    .with_column(polars.col("o_orderkey").is_not_null().alias("o_orderkey_1")).groupby("c_custkey").agg([polars.col("o_orderkey_1").sum()])\
    .groupby("o_orderkey_1").count().sort('count',reverse = True)
    #.sort('o_orderkey_1',reverse = True)

However this give wrong results. Any suggestions?

marsupialtail avatar Mar 09 '23 23:03 marsupialtail

NVM i know what the problem is. I need to make sure "special" comes before "requests". Have to use regex.....

marsupialtail avatar Mar 09 '23 23:03 marsupialtail

Implementation for Pandas for 22 queries: https://gist.github.com/UranusSeven/55817bf0f304cc24f5eb63b2f1c3e2cd

ghuls avatar Mar 23 '23 23:03 ghuls

Polars / pyspark / DuckDB have full query coverage. We should still include the pandas queries. Perhaps the link above could help.

stinodego avatar Mar 04 '24 12:03 stinodego