Robert (Bobby) Evans

Results 204 comments of Robert (Bobby) Evans

This is likely to be a really low priority, because I don't know of any queries where the LIKE pattern is non-scalar.

I think this case might even change run to run. Our aggregations do not guarantee an order that the sum will happen. And floating point is not truly commutative. My...

So it looks like most of this has been fixed in 24.06 after the upmerge went in. I will retest things

I think CUDF already supports this through dropListDuplicates https://github.com/rapidsai/cudf/blob/ac27757092e9ba2bc0656b6a7dfbc79ce8b5e76a/java/src/main/java/ai/rapids/cudf/ColumnView.java#L2375-L2386 We should be able to implement this without any issues, so long at dropListDuplicates supports the types.

@phish3y happy to have you start to work on this. https://github.com/apache/spark/blob/0d7c07047a628bd42eb53eb49935f5e3f81ea1a1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L4036 is the CPU implementation that we want to try and target. It looks like they have special case equality...

This is 100% repeatable, and it calculates different results for 0.5 (median value) every time. I think this is a bug in Spark that I found a while ago. https://issues.apache.org/jira/browse/SPARK-45599...

I think the solution here is to update FloatGen and DoubleGen so that we can replace -0.0 with 0.0. We would enable it for these tests, but keep other tests...

We have had a customer ask about this, so we might bump up the priority on this. I have been looking at the JSON parsing and how that relates to...

Oh, also a single line can contain multiple entries if the top level is an array. ``` [{"a": 1, "b": 1}, {"a": 2}] [{"a": 1, "b": 2.0}] {"a": 1, "b":...

We might want to look at some of the Spark JSON tests, but they are not that complete.