Jason Lowe
Jason Lowe
Note that with timezones using daylight savings or other similar discontinuities where time can "roll back" there can be ambiguous mappings from a timestamp in those timezones to UTC during...
Row comparison failure details ``` [2024-01-31T20:47:58.663Z] --- CPU OUTPUT [2024-01-31T20:47:58.663Z] +++ GPU OUTPUT [2024-01-31T20:47:58.663Z] @@ -82,48 +82,48 @@ [2024-01-31T20:47:58.663Z] Row(a=None, from_json(a)=None) [2024-01-31T20:47:58.663Z] Row(a='{"a": {"b":"iz"} }', from_json(a)=Row(a='{"b":"iz"}')) [2024-01-31T20:47:58.663Z] Row(a='{"a": {"b":"md"} }',...
Note that this failure was from a distributed cluster setup, so the nature of the failure may have something to do with how the input data is partitioned across tasks....
Diff is coming from CPU producing nulls when the GPU does not. Splitting out the differing columns on their own lines, CPU: ``` percentile(val, 0.1, abs(freq))=None, percentile(val, 0, abs(freq))=None, percentile(val,...
I dug into this a bit, and unexpectedly found that the RAPIDS Accelerator is *not* using ZSTD during these tests. Dataproc 2.0 is running Spark 3.1.x, so the tests avoid...
The GPU coalesce on the read side is tracked by #10402.
There are some known cases where the output will not exactly match the CPU behavior. See the compatibility docs for 22.10 at https://github.com/NVIDIA/spark-rapids/blob/branch-22.10/docs/compatibility.md. If the difference you're seeing is not...
Some of the changes in this PR are fine, it's mostly problematic in the `write` method (which admittedly is most of the PR). Given the `write` method needs refactoring to...
@HaoYang670 this should be moved to 23.02 as it doesn't make sense to put this into 22.12 at this point in its release cycle.
> this should be moved to 23.02 My apologies, that's already occurred.