datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Median aggregation using DataFrame panics

Open jonmmease opened this issue 3 years ago • 0 comments

Describe the bug I am trying to use the new exact median aggregation function introduced by @andygrove in https://github.com/apache/arrow-datafusion/pull/3009, but when I try it using the DataFrame API the operation panics. Apologies that I didn't get around to testing this while the PR was open!

To Reproduce Here is a test case that can be run inside the src/core/tests/dataframe.rs file:

#[tokio::test]
async fn exact_median() -> Result<()> {
    let schema = Schema::new(vec![
        Field::new("a", DataType::Int32, true),
        Field::new("b", DataType::Float64, true),
    ]);

    let batch = RecordBatch::try_new(
        Arc::new(schema.clone()),
        vec![
            Arc::new(Int32Array::from_slice(&[1, 1, 1, 1])),
            Arc::new(Float64Array::from_slice(&[10.0, 0.0, 20.0, 100.0])),
        ],
    )
    .unwrap();

    let ctx = SessionContext::new();
    let provider = MemTable::try_new(Arc::new(schema), vec![vec![batch]]).unwrap();
    ctx.register_table("t", Arc::new(provider)).unwrap();

    let df = ctx
        .table("t")
        .unwrap()
        .aggregate(
            vec![col("a")],
            vec![Expr::AggregateFunction {
                fun: AggregateFunction::Median,
                args: vec![col("b")],
                distinct: false,
            }
            .alias("agg")],
        )
        .unwrap();

    let results = df.collect().await.unwrap();

    #[rustfmt::skip]
        let expected = vec![
        "+---+------+",
        "| a | agg  |",
        "+---+------+",
        "| 1 | 15.0 |",
        "+---+------+",
    ];
    assert_batches_eq!(expected, &results);

    Ok(())
}

thread 'exact_median' panicked at 'unexpected accumulator state in hash aggregate: Internal("AggregateState is not a scalar aggregate")', datafusion/core/src/physical_plan/aggregates/hash.rs:434:34
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'exact_median' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(ExternalError(Execution("Join Error: task 29 panicked")))', datafusion/core/tests/dataframe.rs:459:38

Expected behavior I expect the test above to pass

jonmmease avatar Aug 11 '22 15:08 jonmmease