Implement Spark-compatible CAST from String to Date
What is the problem the feature request solves?
What is the problem the feature request solves?
We currently delegate to DataFusion when casting from string to date and there are some differences in behavior compared to Spark.
- Spark supports dates in the format
YYYYandYYYY-MMand DataFusion does not - Spark supports a trailing
Tas in2024-01-01Tand DataFusion does not - DataFusion doesn't throw an exception for invalid inputs in ANSI mode
Edge cases from fuzz testing:
| Input | Spark | DataFusion |
|---|---|---|
"-0973250" |
3251-01-01 |
null |
"-3638-5" |
3639-05-01 |
null |
Describe the potential solution
No response
Additional context
No response
FWIW, comet cast string to timestamp uses a format string that matches the one used by Spark. It still needs to be massaged for trailing zeroes though. Ansi mode was never tried/tested
Hi @andygrove I would love to work on this. If no one else is working on it.
Hi @andygrove I would love to work on this. If no one else is working on it.
Thank you @vidyasankarv that would be awesome! Let me know if you have any questions.
Hi @andygrove, I am new to rust and open source work in general, but eager to learn and contribute. looking at https://github.com/apache/datafusion-comet/pull/335 for time stamp casting which looks similar to my issue. Will keep working on this and get back to you with any questions.
And thank you for your book on How Query Engines Work - thats what got me interested in apahce data fusion.