datafusion-comet
datafusion-comet copied to clipboard
[CometFuzz] Automate keeping function signatures up-to-date with Spark
What is the problem the feature request solves?
In https://github.com/apache/datafusion-comet/pull/2614 the function signatures were improved so that they specify which data types are accepted, rather than just specifying the number of parameters. This helped reduce the number of invalid queries that were generated.
However, this was a manual effort and it would be better if we could automate this somehow.
Some ideas mentioned by @wForget in https://github.com/apache/datafusion-comet/pull/2614#issuecomment-3431489253 are:
- Get input types from ExpectsInputTypes trait
- Parse function signatures from example sql, refer: ExpressionsSchemaSuite.scala
- Parse argument number from function desc, like FUNC(str, charset) in org.apache.spark.sql.catalyst.expressions.Encode desc
Describe the potential solution
No response
Additional context
No response
We should also automate checking that all expressions that are supported by Comet are exercised as part of fuzz testing