Robert (Bobby) Evans comments

Results 202 comments of


                                            Robert (Bobby) Evans

Figure out why `MapFromArrays ` appears in the tests for hive parquet write

This appears to be coming form https://github.com/apache/spark/blob/fd86f85e181fc2dc0f50a096855acf83a6cc5d9c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala#L381-L421 It appears that https://issues.apache.org/jira/browse/SPARK-42151 https://github.com/apache/spark/pull/40308 So technically this is a regression, more accurately a performance regression, in that we could run the query...

Figure out why `MapFromArrays ` appears in the tests for hive parquet write

@sameerz and @mattahrens we now know why the regression has happened and we need to decide what the next steps are. Implementing this is not too difficult. We mainly need...

Fix a NPE issue in GpuRand

I spoke with @jlowe and I think we really want to understand this better. https://github.com/NVIDIA/spark-rapids/issues/11649 The problem is that if a retry happens and it is not in a checkpoint/restore,...

[BUG] GPU JSON reader fails to read the JSON string of an empty body

@res-life are you still planning on working on this? The failures are happening in two places. If you don't provide a schema, then schema discovery returns with an empty schema....

[FEA] [AUDIT] Support expressions to work with collated strings

The plan is explicitly to not support collated strings and to fallback to the CPU if we see them. This is a very large amount of work to try and...

[FEA] Support short-circuit evaluation for expensive expression like rlike

On the GPU the problems typically show up around thread divergence and non-coalesed memory access patterns. I am not 100% sure about this so we should run some experiments and...

[FEA] it would be nice if we could support org.apache.spark.sql.catalyst.expressions.Bin

Would probably need a new kernel for this, but it is just taking a long and outputting the binary representation of it, which should be dead simple to do.

GPU OutOfMemory while DISTINCT a partitionedBy column on DeltaTable ?

@LIN-Yu-Ting Generally we treat GPU OutOfMemory errors as bugs that need to be fixed. There are a few cases where an algorithm cannot be split up into smaller pieces and...

GPU OutOfMemory while DISTINCT a partitionedBy column on DeltaTable ?

Thanks for the updated information. We will try and reproduce this locally and see what we can come up with. For now I think I will just move this over...

Prototype get_json_object

If I try to run `get_json_object_multiple_paths2` with just `$` as the path, which is the same as an empty parsed path vector I get what looks like I get the...