datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Datafusion fails on simple DISTINCT ON query

Open qrilka opened this issue 1 year ago • 1 comments

Describe the bug

Simple SELECT DISTINCT ON query triggers failure in datafusion Internal error: Failed due to a difference in schemas

To Reproduce

Open datafusion-cli and run the commands to get the error:

> create view test as values (1,1),(1,2),(2,0),(3,1),(3,2);
0 row(s) fetched. 
Elapsed 0.001 seconds.

> select * from test;
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1       | 1       |
| 1       | 2       |
| 2       | 0       |
| 3       | 1       |
| 3       | 2       |
+---------+---------+
5 row(s) fetched. 
Elapsed 0.001 seconds.

> select distinct on (column1) * from test;
Optimizer rule 'replace_distinct_aggregate' failed
caused by
replace_distinct_aggregate
caused by
Internal error: Failed due to a difference in schemas, original schema: DFSchema { inner: Schema { fields: [Field { name: "column1", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "column2", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" }), Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: DFSchema { inner: Schema { fields: [Field { name: "column1", data_type: Null, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "test" })], functional_dependencies: FunctionalDependencies { deps: [] } }.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Expected behavior

proper result should be returned

Additional context

Seen in

$ datafusion-cli -V
datafusion-cli 42.0.0

qrilka avatar Oct 15 '24 07:10 qrilka

Thanks for reporting. I can reproduce the issue, gonna try to fix it!

austin362667 avatar Oct 16 '24 14:10 austin362667

I couldn't reproduce the issue in DataFusion CLI v42.1.0. Perhaps you could try testing it again? Let me know if you still encounter the issue. Thank you! @qrilka

austin362667 avatar Oct 21 '24 06:10 austin362667

It isn't even on crates.io - is that expected? Will try installing from sources later

qrilka avatar Oct 21 '24 06:10 qrilka

I tested it on the latest main branch, and everything is working fine. The latest PR was merged last month showed in git blame. I think it's really weird; perhaps the error message pointed to the wrong cause.

austin362667 avatar Oct 21 '24 06:10 austin362667

Yeah, installed from main and looks to be fine, thanks!

qrilka avatar Oct 21 '24 15:10 qrilka