Sameer Raheja comments

Results 70 comments of


                                            Sameer Raheja

Add spark350emr shim layer [EMR]

Closing until we can retarget to the latest branch

Investigate treating scalars in a column batch as dictionary columns

Would need to add dictionary support.

[BUG] GetJsonObject should return null for invalid query instead of throwing an exception

@thirtiseven has this issue been verified to be closed by https://github.com/NVIDIA/spark-rapids/pull/10466 ? cc: @revans2

[BUG] GetJsonObject does not process escape sequences in returned strings or queries

Is this similar to #9033 ?

[FEA] Improve nds performance of iceberg.

Hi @liurenjie1024 , can you be more specific about current performance and what we are going to do to improve performance?

[BUG] Non-deterministic query result corruption when RAPIDS shuffle manager is enabled on WSL2

Removing P1 and removing from 22.08 since the issue only occurs in WSL2 (which we do not support).

[BUG] Special character inputs from the csv file bring inconsistency when using the CPU and GPU engines

Hi @asddfl ( @asdsql ? ), cudf and Spark handle quotes in CSV files differently, which is what you identified. We are working to ensure the RAPIDS Spark plugin matches...

[FEA] Support sha2

We could leverage https://github.com/rapidsai/cudf/pull/9215 , but could also implement the sha-2 algorithm in spark-rapids-jni.

[QST] Iceberg issue with rapids

@arturzangiev let us know if using the newer release with a newer GPU addresses the issue. Closing for now, please reopen if you have further questions.

[BUG] test_exact_percentile_groupby failed GPU and CPU float values are different intermittently

This has not appeared again, even with the same datagen seed. Usually we run where shuffle is done on a single node. The results are based on shuffle and therefore...