Datafusion fails on simple DISTINCT ON query
Describe the bug
Simple SELECT DISTINCT ON query triggers failure in datafusion Internal error: Failed due to a difference in schemas
To Reproduce
Open datafusion-cli and run the commands to get the error:
> create view test as values (1,1),(1,2),(2,0),(3,1),(3,2);
0 row(s) fetched.
Elapsed 0.001 seconds.
> select * from test;
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1 | 1 |
| 1 | 2 |
| 2 | 0 |
| 3 | 1 |
| 3 | 2 |
+---------+---------+
5 row(s) fetched.
Elapsed 0.001 seconds.
> select distinct on (column1) * from test;
Optimizer rule 'replace_distinct_aggregate' failed
caused by
replace_distinct_aggregate
caused by
Internal error: Failed due to a difference in schemas, original schema: DFSchema { inner: Schema { fields: [Field { name: "column1", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "column2", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" }), Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: DFSchema { inner: Schema { fields: [Field { name: "column1", data_type: Null, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Expected behavior
proper result should be returned
Additional context
Seen in
$ datafusion-cli -V
datafusion-cli 42.0.0
Thanks for reporting. I can reproduce the issue, gonna try to fix it!
I couldn't reproduce the issue in DataFusion CLI v42.1.0. Perhaps you could try testing it again? Let me know if you still encounter the issue. Thank you! @qrilka
It isn't even on crates.io - is that expected? Will try installing from sources later
I tested it on the latest main branch, and everything is working fine. The latest PR was merged last month showed in git blame. I think it's really weird; perhaps the error message pointed to the wrong cause.
Yeah, installed from main and looks to be fine, thanks!