datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Implement Spark-compatible CAST from String to Date

Open andygrove opened this issue 1 year ago • 4 comments

What is the problem the feature request solves?

What is the problem the feature request solves?

We currently delegate to DataFusion when casting from string to date and there are some differences in behavior compared to Spark.

  • Spark supports dates in the format YYYY and YYYY-MM and DataFusion does not
  • Spark supports a trailing T as in 2024-01-01T and DataFusion does not
  • DataFusion doesn't throw an exception for invalid inputs in ANSI mode

Edge cases from fuzz testing:

Input Spark DataFusion
"-0973250" 3251-01-01 null
"-3638-5" 3639-05-01 null

Describe the potential solution

No response

Additional context

No response

andygrove avatar Apr 25 '24 13:04 andygrove

FWIW, comet cast string to timestamp uses a format string that matches the one used by Spark. It still needs to be massaged for trailing zeroes though. Ansi mode was never tried/tested

parthchandra avatar Apr 25 '24 18:04 parthchandra

Hi @andygrove I would love to work on this. If no one else is working on it.

vidyasankarv avatar Apr 26 '24 10:04 vidyasankarv

Hi @andygrove I would love to work on this. If no one else is working on it.

Thank you @vidyasankarv that would be awesome! Let me know if you have any questions.

andygrove avatar Apr 26 '24 15:04 andygrove

Hi @andygrove, I am new to rust and open source work in general, but eager to learn and contribute. looking at https://github.com/apache/datafusion-comet/pull/335 for time stamp casting which looks similar to my issue. Will keep working on this and get back to you with any questions.

And thank you for your book on How Query Engines Work - thats what got me interested in apahce data fusion.

vidyasankarv avatar Apr 29 '24 17:04 vidyasankarv