datafusion
datafusion copied to clipboard
Median aggregation using DataFrame panics
Describe the bug I am trying to use the new exact median aggregation function introduced by @andygrove in https://github.com/apache/arrow-datafusion/pull/3009, but when I try it using the DataFrame API the operation panics. Apologies that I didn't get around to testing this while the PR was open!
To Reproduce
Here is a test case that can be run inside the src/core/tests/dataframe.rs file:
#[tokio::test]
async fn exact_median() -> Result<()> {
let schema = Schema::new(vec![
Field::new("a", DataType::Int32, true),
Field::new("b", DataType::Float64, true),
]);
let batch = RecordBatch::try_new(
Arc::new(schema.clone()),
vec![
Arc::new(Int32Array::from_slice(&[1, 1, 1, 1])),
Arc::new(Float64Array::from_slice(&[10.0, 0.0, 20.0, 100.0])),
],
)
.unwrap();
let ctx = SessionContext::new();
let provider = MemTable::try_new(Arc::new(schema), vec![vec![batch]]).unwrap();
ctx.register_table("t", Arc::new(provider)).unwrap();
let df = ctx
.table("t")
.unwrap()
.aggregate(
vec![col("a")],
vec![Expr::AggregateFunction {
fun: AggregateFunction::Median,
args: vec![col("b")],
distinct: false,
}
.alias("agg")],
)
.unwrap();
let results = df.collect().await.unwrap();
#[rustfmt::skip]
let expected = vec![
"+---+------+",
"| a | agg |",
"+---+------+",
"| 1 | 15.0 |",
"+---+------+",
];
assert_batches_eq!(expected, &results);
Ok(())
}
thread 'exact_median' panicked at 'unexpected accumulator state in hash aggregate: Internal("AggregateState is not a scalar aggregate")', datafusion/core/src/physical_plan/aggregates/hash.rs:434:34
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'exact_median' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(ExternalError(Execution("Join Error: task 29 panicked")))', datafusion/core/tests/dataframe.rs:459:38
Expected behavior I expect the test above to pass