[ANSI] Include original SQL in error messages
What is the problem the feature request solves?
Spark includes the original SQL query in error messages with hints to which part of the query failed.
scala> spark.sql("select cast('a' as date)").show()
org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value 'a' of the type "STRING" cannot be cast to "DATE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. SQLSTATE: 22018
== SQL (line 1, position 8) ==
select cast('a' as date)
Comet does not do this:
org.apache.comet.CometNativeException: [CAST_INVALID_INPUT] The value 'a' of the type "STRING" cannot be cast to "DATE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
Therefore we see test failures such as:
did not contain "select cast(s as date) from t"
Example test (in ANSI mode)
SPARK-39175: Query context of Cast should be serialized to executors when WSCG is off *** FAILED ***
Describe the potential solution
No response
Additional context
No response
Hi @andygrove , Could I give it a try about this issue?
We should revert the diffs in https://github.com/apache/datafusion-comet/pull/2211 once this gets resovled
Hi @andygrove and @kazuyukitanimura,
After analyzing the error stack trace, I found that the error is thrown from nativeLib.executePlan.
Based on this analysis, I propose the following approach to address issue #2215:
- Create a private method to retrieve the current SQL text: I'm currently investigating whether we can use SQLExecution.EXECUTION_ID_KEY to fetch the actual SQL query text. However, I'm still exploring the best way to access the original SQL statement at this execution stage.
- Modify the error handling in spark-expr/src/error.rs: Extend the error structs (e.g., CastInvalidValue) to include an optional SQL context field that can store the original query text.
- Modify Java_org_apache_comet_Native_executePlan in jni_api.rs: Pass the SQL context from step 1 to the native execution context, so it's available when errors are generated.
Does this approach seem reasonable and feasible? I would greatly appreciate your guidance and feedback on this proposed solution, particularly regarding:
- The feasibility of accessing SQL text at the execution phase
- Whether modifying the JNI interface to pass SQL context is the right approach
- Any potential issues or better alternatives you might suggest
Thank you for your time and guidance!
I think a different approach is required here. The error in this example comes from QueryExecutionErrors. Afaik, when an error of this type is thrown, Spark will handle the proper reporting of the error and the query.
We should ideally have a mapping for Comet/Datafusion errors that matches the error to the relevant QueryExecutionError and in CometExecIterator we should catch the CometNativeException, convert it into a QueryExecutionError, and rethrow it.
In CometExecIterator we already catch some CometNativeExceptions which are then converted into a SparkException. This currently is specific to a set of errors which are matched by checking the contents of the error message (which is not quite ideal). It would be good to change these too.
@mbutrovich might have some ideas here
@CuteChuanChuan , Wanted to know if you could let me know if this is still something you are working on ? Or else I would like to collaborate / take a stab at it :) > Re raising the right stackTrace , CometNativeExceptions -> SparkException route as @parthchandra mentioned has my vote 🥇
@coderfender Feel free to take this on ❤️ - I'm not actively working on it at the moment.
Thank you