datafusion-comet
datafusion-comet copied to clipboard
feat: Add experimental support for native Parquet writes
Which issue does this PR close?
Part of https://github.com/apache/datafusion-comet/issues/1625
Rationale for this change
We would eventually like to support native writes to Parquet. This PR adds a starting point for further development.
This is the result of vibe coding with Claude.
The goal is to add the minium possible implementation and test. There are plenty of things that are not implemented or tested yet.
Example of new native plan:
ParquetWriterExec: path=file:/private/var/folders/vv/fmb1n2hx3yqdmxbrv7shzyvr0000gn/T/spark-79afc322-5315-4f47-85dc-c974eeb44d2c/output.parquet, compression=Snappy
ScanExec: source=[write_source], schema=[col_0: Int32, col_1: Utf8]
What changes are included in this PR?
- New native
ParquetWriterExec - New scala
CometNativeWriteExec - Updates to
CometExecRule - One working test
How are these changes tested?
New suite added.
A good test would be to write with this feature enabled and then read it with and without Comet enabled.
Codecov Report
:x: Patch coverage is 67.47967% with 40 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 59.16%. Comparing base (f09f8af) to head (8d9b41a).
:warning: Report is 727 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #2812 +/- ##
============================================
+ Coverage 56.12% 59.16% +3.03%
- Complexity 976 1477 +501
============================================
Files 119 167 +48
Lines 11743 15188 +3445
Branches 2251 2523 +272
============================================
+ Hits 6591 8986 +2395
- Misses 4012 4917 +905
- Partials 1140 1285 +145
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
A good test would be to write with this feature enabled and then read it with and without Comet enabled.
Added. Thanks.
Thanks for the reviews @comphead and @wForget. I'm going to go ahead and merge this and will have a draft PR up today for file commit protocol.