Alessandro Bellina

Results 50 issues of Alessandro Bellina

We often (but not always) see errors like: ``` mlx5dv_devx_create_event_channel() failed: Protocol not supported ``` Which results in `Input/output error`, and our test application failing. We are using UCX 1.10.1...

Bug

The date format in file_info.clj for static files: EEE, dd MMM yyyy HH:mm:ss ZZZ, is incorrect. ZZZ should be z. This makes an rfc 1123 standard date: Wed, 08 Jun...

bug

The `GpuExtractChunk32` expression gets evaluated 4 times for a decimal column (as far as I understand). This expression could also have some child expressions, for example: `gpudecimal128sum((cast(ss_quantity#139 as decimal(10,0)) *...

performance
P1

This PR https://github.com/apache/spark/commit/29e4552831 added UNPIVOT to the sql interface. From the code it says unpivot turns into `expand`. This issue is to add some test queries that exercise UNPIVOT, likely...

feature request
? - Needs Triage
audit_3.4.0
Spark 3.4+

This change https://github.com/apache/spark/commit/e6bebb6665 moved to using `XORShiftRandom` instead of `Random(hashing.byteswap32(index))` in a couple of places. The `RDD.coalesce` one I don't believe affects us, but the change to `getPartitionKeyExtractor` should potentially...

feature request
? - Needs Triage
audit_3.4.0
Spark 3.4+

This epic is trying to group together tasks that will help us achieve a pretty tall order, which is to run without fatal OOMs. Non-fatal OOMs are defined as those...

epic
reliability

When invoking cuDF we may or may not hold GPU memory. The purpose of this task is to add a mechanism that may need cuDF changes, to track what each...

cudf_dependency
reliability

As @revans2 mentions here https://github.com/NVIDIA/spark-rapids/pull/6810#discussion_r996044048, the python worker interaction with the `GpuSemaphore` is a bit more complicated than 1 thread per task. I am filing this to investigate this edge...

feature request
? - Needs Triage
reliability

When Spark overflows with `AnsiCast` we get an exception like this: ``` java.lang.ArithmeticException: Casting 9223372036854775807 to int causes overflow ``` But the plugin doesn't show the value that would overflow...

feature request
? - Needs Triage
ease of use

Providing heap dumps and stack traces on GPU OOM are ways to narrow down memory misuse. How many stack traces and heap dumps to output is not a clear choice....

feature request
? - Needs Triage
reliability