datafusion-comet
datafusion-comet copied to clipboard
Will Comet support closed-source forks of Apache Spark (e.g. CSP versions)?
What is the problem the feature request solves?
We have our first PR up that works around an issue with Comet working with AWS Spark (https://github.com/apache/datafusion-comet/pull/412).
I think we need to carefully consider our stance on supporting closed-source forks of Spark from the cloud service providers.
Supporting closed-source Spark versions is challenging for many reasons:
- There are often custom operators and expressions that Comet will not be aware of, and, therefore will not be able to map to native versions
- Software updates can be pushed out at any time, potentially breaking compatibility with Comet
- It is difficult to debug issues because source is not available (and reverse engineering is prohibited)
- We would ideally need CI to test against all supported CSP versions
- It increases the maintenance burden for all contributors, and not all contributors will have access to the CSP versions for testing
If the community desires to maintain Comet versions that can work with CSP Spark versions, then I think we would need to find an approach that allows those contributors to extend the "core" Comet project and add CSP support without adding maintenance burden for the core project.
One idea, for example, would be to keep the core datafusion-comet
project compatible with OSS Apache Spark, and then have specific downstream repositories such as datafusion-comet-aws
that extend the project to support a specific CSP.
Describe the potential solution
No response
Additional context
No response