seafowl
seafowl copied to clipboard
`CoalescePartitionsExec requires at least one input partition` when aggregating empty tables on single-core machines
This only happens on Fly.io (I think I tested this with the same Docker image locally and didn't get the issue):
$ curl -iH "Content-Type: application/json" https://seafowl/q -d '{"query": "CREATE TABLE test2 (key INTEGER, value TEXT)"}'
HTTP/2 200
$ curl -iH "Content-Type: application/json" https://seafowl/q -d '{"query": "SELECT COUNT(*) FROM test2"}'
HTTP/2 400
Internal error: CoalescePartitionsExec requires at least one input partition. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Local Docker (TODO: quadruple-check it's definitely the same Docker image as the one on Fly):
~ $ curl -iH "Content-Type: application/json" http://localhost:8080/q -d '{"query": "CREATE TABLE test (key INTEGER, value TEXT)"}'
HTTP/1.1 200 OK
content-type: application/octet-stream
vary: Content-Type, Origin, X-Seafowl-Query
content-length: 0
date: Mon, 31 Oct 2022 16:48:31 GMT
~ $ curl -iH "Content-Type: application/json" http://localhost:8080/q -d '{"query": "SELECT COUNT(*) FROM test"}'
HTTP/1.1 200 OK
content-type: application/octet-stream
vary: Content-Type, Origin, X-Seafowl-Query
content-length: 22
date: Mon, 31 Oct 2022 16:48:40 GMT
{"COUNT(UInt8(1))":0}
Doesn't just happen on Fly.io, I had an older debug version in my Docker, could be a regression in a recent DF.
It looks like it's because of the number of cores allocated (becomes num_cpus
and then feeds into DF as the target_partition_count
setting.
1 core (with ./taskset -c 1 seafowl):
ProjectionExec: expr=[COUNT(UInt8(1))@0 as COUNT(UInt8(1))] +
AggregateExec: mode=Final, gby=[], aggr=[COUNT(UInt8(1))] +
CoalescePartitionsExec +
AggregateExec: mode=Partial, gby=[], aggr=[COUNT(UInt8(1))]+
ParquetExec: limit=None, partitions=[], projection=[key] +
multicore:
ProjectionExec: expr=[COUNT(UInt8(1))@0 as COUNT(UInt8(1))] +
AggregateExec: mode=Final, gby=[], aggr=[COUNT(UInt8(1))] +
CoalescePartitionsExec +
AggregateExec: mode=Partial, gby=[], aggr=[COUNT(UInt8(1))] +
RepartitionExec: partitioning=RoundRobinBatch(4) +
ParquetExec: limit=None, partitions=[], projection=[key]+
The physical plan for single-core is missing RepartitionExec
.
Seems not to be occurring anymore, even without the fix from #189 (which was removed in #422); closing.