datafusion
datafusion copied to clipboard
TPC-H Query 6 has a wrong result
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-12218
TPC-H Query 6 gives a wrong result according to the test in the benchmarks.
{{TPCH_DATA=[..]/tpch-dbgen cargo test --release}}
Query 6 iteration 0 took 6137.1 ms
Query 6 avg time: 6137.09 ms
thread 'tests::q6' panicked at 'assertion failed: (left == right)
left: ["123141078.23"],
right: ["75207768.18550001"]', benchmarks/src/bin/tpch.rs:1684:17
[~alamb]
The problem with the query is that 0.5 + 0.2 gets parsed as 0.69999999_
Projection: #SUM(bench.public.lineitem.l_extendedprice * bench.public.lineitem.l_discount) AS revenue
Aggregate: groupBy=[[]], aggr=[[SUM(#bench.public.lineitem.l_extendedprice * #bench.public.lineitem.l_discount)]]
Filter: #bench.public.lineitem.l_shipdate >= Date32("8766") AND #bench.public.lineitem.l_shipdate < Date32("9131") AND #bench.public.lineitem.l_discount BETWEEN Float64(0.049999999999999996) AND Float64(0.06999999999999999) AND #bench.public.lineitem.l_quantity < Int64(24)
TableScan: bench.public.lineitem projection=Some([l_quantity, l_extendedprice, l_discount, l_shipdate]), partial_filters=[#bench.public.lineitem.l_shipdate >= Date32("8766"), #bench.public.lineitem.l_shipdate < Date32("9131"), #bench.public.lineitem.l_discount BETWEEN Float64(0.049999999999999996) AND Float64(0.06999999999999999), #bench.public.lineitem.l_quantity < Int64(24)]
🤔 maybe we need to use Decimal type rather than Float for constants. When we initially ran TPCH we did not have support for this data type, but now thanks to the work from @liukun4515 (and others) we do!
🤔 maybe we need to use
Decimaltype rather thanFloatfor constants. When we initially ran TPCH we did not have support for this data type, but now thanks to the work from @liukun4515 (and others) we do!
That would probably do it. Postgres also converts decimal literals in the sql statement to decimal types. https://stackoverflow.com/questions/68142210/what-is-the-type-of-100-0-in-postgresql
I see where to make the change (parse_sql_number() inside planner.rs), but I'm struggling trying to figure out how to turn the string input to a Decimal128 type. Can you give me some pointers?
Hm, I realize now I should be converting from float to Decimal128 :)
@kmitchener After convert to decimal128, can we get the right result for Q6?
I see where to make the change (parse_sql_number() inside planner.rs), but I'm struggling trying to figure out how to turn the string input to a Decimal128 type. Can you give me some pointers?
Do you want to convert string value of string data type like "123.13" to the decimal128 data type?
hi @liukun4515, no, with decimal its still not the right result and I'm not sure why yet. but it's much closer:
Left: ["123141078.23"]
Right: ["123140554.79"]
I think it's correct behavior to use the Decimal type to represent non-integer numeric literals in the SQL, so I'm going to open a separate issue for that and will ask you to review that, because I'm unsure how to convert to Decimal128 efficiently.