spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-40473][SQL] Migrate parsing errors onto error classes

Open MaxGekk opened this issue 2 years ago • 5 comments

What changes were proposed in this pull request?

In the PR, I propose to migrate all parsing errors onto temporary error classes with the prefix _LEGACY_ERROR_TEMP_. The error message will not include the error classes, so, in this way we will preserve the existing behaviour.

Why are the changes needed?

The migration on temporary error classes allows to gather statistics about errors and detect most popular error classes.

Does this PR introduce any user-facing change?

No. The error messages should be almost the same by default.

How was this patch tested?

By running the modified test suites:

$ build/sbt "core/testOnly *SparkThrowableSuite"
$ build/sbt "test:testOnly *ExpressionParserSuite"

MaxGekk avatar Sep 16 '22 09:09 MaxGekk

cc @srielau @anchovYu Could you take a look at the PR, please.

MaxGekk avatar Sep 16 '22 14:09 MaxGekk

You seem to assume < 1000 of these. But just this one PR consumes close to a hundred slots"

Some time ago, I have counted the total number of exceptions to be ported onto error classes around 800. I don't see any problems to start using 4 digits after 999.

How do we keep the numbering straight for the next n PRs?

We have 3 types of errors, actually: parsing, compilation and execution. We could migrate them 1-by-1 using sequential numbers. Otherwise the PR will conflict a lot.

I assume you used some tooling to whip this up so fast.

I think it will be faster to manually port the exceptions without any tooling.

MaxGekk avatar Sep 16 '22 16:09 MaxGekk

Can we update core/src/main/resources/error/README.md to mention this special error class naming prefix?

cloud-fan avatar Sep 19 '22 12:09 cloud-fan

@dongjoon-hyun @viirya @HyukjinKwon The new error class name prefix _LEGACY_ERROR_TEMP_ proposed here kind of marks the error as developer-facing, not user-facing. Developers can still get the error class programmatically via the SparkThrowable interface, so that they can build error infra with it. End users won't see the error class in the message. This allows us to do the error migration very quickly, and we can refine the error classes and mark them as user-facing later (naming them properly, adding tests, etc.). What do you think?

cloud-fan avatar Sep 20 '22 06:09 cloud-fan

@gatorsmile @cloud-fan @HyukjinKwon @gengliangwang @dongjoon-hyun @viirya I would like to merge this if there are no objections from your side.

MaxGekk avatar Sep 21 '22 15:09 MaxGekk

Merging to master. Thank you, @cloud-fan @itholic @entong @srielau for review.

MaxGekk avatar Sep 22 '22 08:09 MaxGekk