Comet sort order different to Spark for 0.0 and -0.0
Describe the bug
During testing with CAST, we execute queries with ORDER by a and it seems the results are ordered differently for 0.0 vs -0.0 with float and double.
![0.0,0.0] [-0.0,-0.0]
![-0.0,-0.0] [0.0,0.0]
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
Scala/Java seems to take the sign into account during sorting and Rust does not. Perhaps we just need to document this as an edge case in the compatibility guide.
Scala
val x: Seq[Float] = Seq(0.0f, -0.0f, 0.0f, -0.0f, 1.0f, -1.0f)
println(x.sorted)
Output: List(-1.0, -0.0, -0.0, 0.0, 0.0, 1.0)
Rust
let mut v = vec![0.0_f32, -0.0_f32, 0.0_f32, -0.0_f32, 1.0_f32, -1.0_f32];
v.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Greater));
println!("{:?}", v);
Output: [-1.0, 0.0, -0.0, 0.0, -0.0, 1.0]
We should also test with NaN in sorting
TIL: IEEE 754 says that +0.0 and -0.0 are equal and in C and Java +0.0 == -0.0 is true but in Java the Double.equals does not treat the two as equal.
https://en.wikipedia.org/wiki/Signed_zero#Comparisons
I don't think SQL distinguishes between the two so I would not consider this an issue that needs to be fixed.
We should also test with NaN in sorting
Also +infinity and -infinity while we are at it.
It is probably worth having a section in the compatibility guide specifically for Rust vs Java differences like this.
I found that Spark ORDER BY is not stable sort indicated by https://issues.apache.org/jira/browse/SPARK-45243
Datafusion sort is not stable either https://github.com/apache/datafusion/blob/9b4f90ad1eefabdc0d5bbbfd99e58765b041bb77/datafusion/physical-plan/src/sorts/sort.rs#L601
SQL standard does not seem to guarantee the stabiility https://stackoverflow.com/questions/15522746/is-sql-order-by-clause-guaranteed-to-be-stable-by-standards
I see sometimes Spark and Datafusion sort order becomes same for 0.0 and -0.0. I agree that I would not worry about it
It is working fine for NaN +infinity, and -infinity
The only thing is that it seems Spark always sorts 0.0 first before -0.0 that does not make sense. Perhaps this is a Spark issue.