spark-rapids Fix tests failures in string

FAILED ../../../../integration_tests/src/main/python/string_test.py::test_endswith
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_unsupported_fallback_substring_index

Jun 08 '24 05:06 razajafri

test_unsupported_fallback_substring_index fails with a legitimate cause:

E               pyspark.errors.exceptions.captured.NumberFormatException: For input string: "rdd_value_2"

The other tests all pass with ANSI mode disabled.

Jun 12 '24 23:06 mythrocks

This is odd. I can't seem to repro this failure now.

Jun 25 '24 01:06 mythrocks

I have double-checked my work. These tests don't fail.

I'm closing this. We can reopen this if we see failures in the future.

Jun 25 '24 17:06 mythrocks

Yep, I think I spoke too soon. Reopening.

Jun 25 '24 20:06 mythrocks

The problem with .endswith is proving elusive. While this can be repro-ed in test, its occurrence is occasional from the REPL. For a brief while, it could be repro-ed simply by adding the plugin jar to the class path. (i.e. not even enabling the plugin.) It appeared to have been some sort of shading error.

I'm still investigating, but this is proving a time sink.

Jul 19 '24 22:07 mythrocks

Yep, this is still baffling. Here is the exception:

py4j.protocol.Py4JJavaError: An error occurred while calling o206.endsWith.
: java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.Column.expr()" because "x$1" is null
      at org.apache.spark.sql.Column$.$anonfun$fn$2(Column.scala:77)
      at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:75)
      at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35)
      at org.apache.spark.sql.Column$.$anonfun$fn$1(Column.scala:77)
      at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:84)
      at org.apache.spark.sql.package$.withOrigin(package.scala:111)
      at org.apache.spark.sql.Column$.fn(Column.scala:76)
      at org.apache.spark.sql.Column$.fn(Column.scala:64)
      at org.apache.spark.sql.Column.fn(Column.scala:169)
      at org.apache.spark.sql.Column.endsWith(Column.scala:1078)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)

This is pointing into new code in Spark 4.0.

Column {
  UnresolvedFunction(Seq(name), inputs.map(_.expr), isDistinct, ignoreNulls = ignoreNulls)
}

The complaint seems to be that .expr can't be called on the null passed into .endswith(). (Note that the code sees this as a null Column, and not a literal.)

I'm unable to repro this from the command line. Attaching a debugger allows this code to run through as well.

This is occasionally reproducible from the pyspark shell. The exception is thrown from Spark CPU, and should not need the plugin for repro.

I'm fairly confident that this is a bug in Spark 4, that routes None as column, instead of a literal.

Jul 22 '24 21:07 mythrocks

As for the problem highlighted in test_unsupported_fallback_substring_index, I'm fairly certain this is a bug in code-gen in Spark 4.0. Here's the stack trace:

scala> sql("select SUBSTRING_INDEX('a', '_', num) from mytable ").show(false)
java.lang.NumberFormatException: For input string: "columnartorow_value_0"
  at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
  at java.base/java.lang.Integer.parseInt(Integer.java:668)
  at org.apache.spark.sql.catalyst.expressions.SubstringIndex.$anonfun$doGenCode$29(stringExpressions.scala:1449)
  at org.apache.spark.sql.catalyst.expressions.TernaryExpression.$anonfun$defineCodeGen$3(Expression.scala:869)
  at org.apache.spark.sql.catalyst.expressions.TernaryExpression.nullSafeCodeGen(Expression.scala:888)
  at org.apache.spark.sql.catalyst.expressions.TernaryExpression.defineCodeGen(Expression.scala:868)
  at org.apache.spark.sql.catalyst.expressions.SubstringIndex.doGenCode(stringExpressions.scala:1448)
  at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:207)

Edit: I have filed https://issues.apache.org/jira/browse/SPARK-48989 against Spark 4.x, to track the WholeStageCodeGen/NFE problem. This is happening on the CPU, without the plugin's involvement.

Jul 22 '24 22:07 mythrocks

New test failures with Spark-4.0 release jar:

================================================================ short test summary info =================================================================
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_conv_with_more_valid_values[2-from_36][DATAGEN_SEED=1749852815, TZ=UTC] - pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003
== SQL (line 1, position 25) ==
select conv('', 3, 35), conv(str_col, 36, 2), conv(null, 36, 2), conv(str_co...
                        ^^^^^^^^^^^^^^^^^^^^
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_conv_with_more_valid_values[21-from_36][DATAGEN_SEED=1749852815, TZ=UTC] - pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003
== SQL (line 1, position 25) ==
select conv('', 3, 35), conv(str_col, 36, 21), conv(null, 36, 21), conv(str_c...
                        ^^^^^^^^^^^^^^^^^^^^^
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_conv_with_more_valid_values[16-from_36][DATAGEN_SEED=1749852815, TZ=UTC, INJECT_OOM] - pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003
== SQL (line 1, position 25) ==
select conv('', 3, 35), conv(str_col, 36, 16), conv(null, 36, 16), conv(str_c...
                        ^^^^^^^^^^^^^^^^^^^^^
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_conv_with_more_valid_values[36-from_36][DATAGEN_SEED=1749852815, TZ=UTC] - pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003
== SQL (line 1, position 25) ==
select conv('', 3, 35), conv(str_col, 36, 36), conv(null, 36, 36), conv(str_c...
                        ^^^^^^^^^^^^^^^^^^^^^
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_conv_with_more_valid_values[10-from_36][DATAGEN_SEED=1749852815, TZ=UTC] - pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003
== SQL (line 1, position 25) ==
select conv('', 3, 35), conv(str_col, 36, 10), conv(null, 36, 10), conv(str_c...
                        ^^^^^^^^^^^^^^^^^^^^^
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_conv_with_more_valid_values[-2-from_36][DATAGEN_SEED=1749852815, TZ=UTC, INJECT_OOM] - pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003
== SQL (line 1, position 25) ==
select conv('', 3, 35), conv(str_col, 36, -2), conv(null, 36, -2), conv(str_c...
                        ^^^^^^^^^^^^^^^^^^^^^
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_conv_with_more_valid_values[-10-from_36][DATAGEN_SEED=1749852815, TZ=UTC, INJECT_OOM] - pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003
== SQL (line 1, position 25) ==
select conv('', 3, 35), conv(str_col, 36, -10), conv(null, 36, -10), conv(str_...
                        ^^^^^^^^^^^^^^^^^^^^^^
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_conv_with_more_valid_values[-16-from_36][DATAGEN_SEED=1749852815, TZ=UTC] - pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003
== SQL (line 1, position 25) ==
select conv('', 3, 35), conv(str_col, 36, -16), conv(null, 36, -16), conv(str_...
                        ^^^^^^^^^^^^^^^^^^^^^^
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_conv_with_more_valid_values[-29-from_36][DATAGEN_SEED=1749852815, TZ=UTC] - pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003
== SQL (line 1, position 25) ==
select conv('', 3, 35), conv(str_col, 36, -29), conv(null, 36, -29), conv(str_...
                        ^^^^^^^^^^^^^^^^^^^^^^
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_conv_with_more_valid_values[-33-from_36][DATAGEN_SEED=1749852815, TZ=UTC, INJECT_OOM] - pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003
== SQL (line 1, position 25) ==
select conv('', 3, 35), conv(str_col, 36, -33), conv(null, 36, -33), conv(str_...
                        ^^^^^^^^^^^^^^^^^^^^^^
FAILED ../../../../integration_tests/src/main/python/string_test.py::test_conv_with_more_invalid_values[DATAGEN_SEED=1749852815, TZ=UTC, INJECT_OOM] - pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in function conv(). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003
== SQL (line 1, position 32) ==
select conv('1112222', 2, 10), conv(str_col, from_col, to_col), conv(str_col, 0, 36), conv(str...
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
========================================== 11 failed, 125 passed, 2 skipped, 1 xpassed, 702 warnings in 42.71s ===========================================

Jun 13 '25 22:06 nartal1

Fix tests failures in string_test.py