MithunR issues

Results 22 issues of


                                            MithunR

[FEA] Support querying map scalars with key vectors

Followup to #6288. The `spark-rapids` plugin currently supports querying `map` scalars only with key scalars. Map vectors can be queried with both scalar and vector keys. So the following query...

feature request

[BUG] `UPDATE` on a Databricks (10.4) DELTA table leads to JVM crash

The Spark Executor JVM crashes when `UPDATE` command is run on a Delta table on Databricks 10.4. This does not appear to break on Apache Spark (3.2.1, at least). ##...

bug

[BUG] Java `Scalar` does not consider Decimal scale for `.hashCode()`/`.equals()`

In its current implementation, the `Scalar` Java class does not consider the `scale` of a scalar value, when comparing two `DECIMAL` scalars. Here is the section of `Scalar.equals()` that compares...

bug

invalid

cuDF (Java)

[BUG] String columns written with `fastparquet` seem to be read incorrectly via CUDF's Parquet reader

**Description** This was uncovered in [Spark tests](https://github.com/NVIDIA/spark-rapids/pull/9366) that compare Parquet read/write compatibility with [`fastparquet`](https://fastparquet.readthedocs.io/en/latest/index.html). The last row of a String column written with `fastparquet` seems to be interpreted by CUDF...

bug

0 - Backlog

libcudf

cuIO

[BUG] Misinterpretation of Parquet List schema with single GROUP child named "array"

This bug is to track a (possible) misinterpretation of Parquet list schemas when stored in a legacy format. This is a follow-up to https://github.com/rapidsai/cudf/pull/13277. This is specific to rules #3...

bug

0 - Backlog

libcudf

cuIO

`strings::contains()` for multiple scalar search targets

## Description This commit adds a new `strings::contains()` overload that allows for the search of multiple scalar search targets in the same call. The trick here is that a new...

feature request

libcudf

Java

non-breaking

cuDF (Java)

[AUDIT][TASK] Explore reducing target size when coalescing partitions with exploding joins

**Description** This is the result of auditing [SPARK-47247](https://github.com/apache/spark/commit/e310e76e63f). SPARK-47247 changes the target partition size for AQE partition coaflescing (`spark.sql.adaptive.advisoryPartitionSizeInBytes`) from the default of `64MB` to the value of `spark.sql.adaptive.coaelscePartitions.minPartitionSize` (whose...

? - Needs Triage

audit_4.0.0

Fix ANSI mode failures in subquery_test.py [databricks]

Fixes #11029. Some tests in subquery_test.py fail when run with ANSI mode enabled, because certain array columns are accessed with invalid indices. These tests predate the availability of ANSI mode...

test

Spark 4.0+

WIP: Spark 4: Fix miscellaneous tests including logic, repart, hive_delimited.

(Partially) fixes #11031. This PR addresses tests that fail on Spark 4.0 in the following files: 1. `integration_tests/src/main/python/datasourcev2_read_test.py` 2. `integration_tests/src/main/python/expand_exec_test.py` 3. `integration_tests/src/main/python/get_json_test.py` 4. `integration_tests/src/main/python/hive_delimited_text_test.py` 5. `integration_tests/src/main/python/logic_test.py` 6. `integration_tests/src/main/python/repart_test.py` 7. `integration_tests/src/main/python/time_window_test.py`

[BUG] `get_json_test.py::test_get_json_object_quoted_question` fails on Spark 4 with mismatched output

`get_json_test.py::test_get_json_object_quoted_question` fails on Spark 4 with mismatched output: ``` ---------------------------- Captured stderr setup ----------------------------- 2024-07-02 23:57:30 INFO Running test 'src/main/python/get_json_test.py::test_get_json_object_quoted_question[DATAGEN_SEED=1719964646, TZ=UTC]' ------------------------------ Captured log setup ------------------------------ INFO __pytest_worker_logger__:spark_init_internal.py:256 Running test...

bug

? - Needs Triage