datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

[ANSI] Include original SQL in error messages

Open andygrove opened this issue 4 months ago • 5 comments

What is the problem the feature request solves?

Spark includes the original SQL query in error messages with hints to which part of the query failed.

scala> spark.sql("select cast('a' as date)").show()
org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value 'a' of the type "STRING" cannot be cast to "DATE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. SQLSTATE: 22018
== SQL (line 1, position 8) ==
select cast('a' as date)

Comet does not do this:

org.apache.comet.CometNativeException: [CAST_INVALID_INPUT] The value 'a' of the type "STRING" cannot be cast to "DATE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.

Therefore we see test failures such as:

did not contain "select cast(s as date) from t"

Example test (in ANSI mode)

SPARK-39175: Query context of Cast should be serialized to executors when WSCG is off *** FAILED ***

Describe the potential solution

No response

Additional context

No response

andygrove avatar Aug 22 '25 14:08 andygrove

Hi @andygrove , Could I give it a try about this issue?

CuteChuanChuan avatar Aug 25 '25 13:08 CuteChuanChuan

We should revert the diffs in https://github.com/apache/datafusion-comet/pull/2211 once this gets resovled

kazuyukitanimura avatar Aug 26 '25 23:08 kazuyukitanimura

Hi @andygrove and @kazuyukitanimura,

After analyzing the error stack trace, I found that the error is thrown from nativeLib.executePlan.

Based on this analysis, I propose the following approach to address issue #2215:

  1. Create a private method to retrieve the current SQL text: I'm currently investigating whether we can use SQLExecution.EXECUTION_ID_KEY to fetch the actual SQL query text. However, I'm still exploring the best way to access the original SQL statement at this execution stage.
  2. Modify the error handling in spark-expr/src/error.rs: Extend the error structs (e.g., CastInvalidValue) to include an optional SQL context field that can store the original query text.
  3. Modify Java_org_apache_comet_Native_executePlan in jni_api.rs: Pass the SQL context from step 1 to the native execution context, so it's available when errors are generated.

Does this approach seem reasonable and feasible? I would greatly appreciate your guidance and feedback on this proposed solution, particularly regarding:

  • The feasibility of accessing SQL text at the execution phase
  • Whether modifying the JNI interface to pass SQL context is the right approach
  • Any potential issues or better alternatives you might suggest

Thank you for your time and guidance!

CuteChuanChuan avatar Aug 30 '25 15:08 CuteChuanChuan

I think a different approach is required here. The error in this example comes from QueryExecutionErrors. Afaik, when an error of this type is thrown, Spark will handle the proper reporting of the error and the query.

We should ideally have a mapping for Comet/Datafusion errors that matches the error to the relevant QueryExecutionError and in CometExecIterator we should catch the CometNativeException, convert it into a QueryExecutionError, and rethrow it. In CometExecIterator we already catch some CometNativeExceptions which are then converted into a SparkException. This currently is specific to a set of errors which are matched by checking the contents of the error message (which is not quite ideal). It would be good to change these too.

parthchandra avatar Sep 13 '25 00:09 parthchandra

@mbutrovich might have some ideas here

parthchandra avatar Sep 13 '25 00:09 parthchandra

@CuteChuanChuan , Wanted to know if you could let me know if this is still something you are working on ? Or else I would like to collaborate / take a stab at it :) > Re raising the right stackTrace , CometNativeExceptions -> SparkException route as @parthchandra mentioned has my vote 🥇

coderfender avatar Nov 25 '25 08:11 coderfender

@coderfender Feel free to take this on ❤️ - I'm not actively working on it at the moment.

CuteChuanChuan avatar Nov 25 '25 08:11 CuteChuanChuan

Thank you

coderfender avatar Nov 25 '25 08:11 coderfender