datafusion
datafusion copied to clipboard
Review `NaN` handling in `median` and `approx_median`
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We do not have tests involving NaN for approx_median, and the current behavior of median is likely not desirable regarding NaN. This issue is to follow up and document and possibly change the behavior and add more tests.
Describe the solution you'd like Ideally, make sure we are compatible with PostgreSQL.
Describe alternatives you've considered None
Additional context Tests in aggregates.rs (being added in https://github.com/apache/arrow-datafusion/pull/3009)
#[tokio::test]
async fn median_f64_nan() -> Result<()> {
median_test(
"median",
DataType::Float64,
Arc::new(Float64Array::from(vec![1.1, f64::NAN, f64::NAN, f64::NAN])),
"NaN", // probably not the desired behavior? - see https://github.com/apache/arrow-datafusion/issues/3039
)
.await
}
#[tokio::test]
async fn approx_median_f64_nan() -> Result<()> {
median_test(
"approx_median",
DataType::Float64,
Arc::new(Float64Array::from(vec![1.1, f64::NAN, f64::NAN, f64::NAN])),
"NaN", // probably not the desired behavior? - see https://github.com/apache/arrow-datafusion/issues/3039
)
.await
}