spark
spark copied to clipboard
[SPARK-40473][SQL] Migrate parsing errors onto error classes
What changes were proposed in this pull request?
In the PR, I propose to migrate all parsing errors onto temporary error classes with the prefix _LEGACY_ERROR_TEMP_
. The error message will not include the error classes, so, in this way we will preserve the existing behaviour.
Why are the changes needed?
The migration on temporary error classes allows to gather statistics about errors and detect most popular error classes.
Does this PR introduce any user-facing change?
No. The error messages should be almost the same by default.
How was this patch tested?
By running the modified test suites:
$ build/sbt "core/testOnly *SparkThrowableSuite"
$ build/sbt "test:testOnly *ExpressionParserSuite"
cc @srielau @anchovYu Could you take a look at the PR, please.
You seem to assume < 1000 of these. But just this one PR consumes close to a hundred slots"
Some time ago, I have counted the total number of exceptions to be ported onto error classes around 800. I don't see any problems to start using 4 digits after 999.
How do we keep the numbering straight for the next n PRs?
We have 3 types of errors, actually: parsing, compilation and execution. We could migrate them 1-by-1 using sequential numbers. Otherwise the PR will conflict a lot.
I assume you used some tooling to whip this up so fast.
I think it will be faster to manually port the exceptions without any tooling.
Can we update core/src/main/resources/error/README.md
to mention this special error class naming prefix?
@dongjoon-hyun @viirya @HyukjinKwon The new error class name prefix _LEGACY_ERROR_TEMP_
proposed here kind of marks the error as developer-facing, not user-facing. Developers can still get the error class programmatically via the SparkThrowable
interface, so that they can build error infra with it. End users won't see the error class in the message. This allows us to do the error migration very quickly, and we can refine the error classes and mark them as user-facing later (naming them properly, adding tests, etc.). What do you think?
@gatorsmile @cloud-fan @HyukjinKwon @gengliangwang @dongjoon-hyun @viirya I would like to merge this if there are no objections from your side.
Merging to master. Thank you, @cloud-fan @itholic @entong @srielau for review.