datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

feat: Add experimental support for native Parquet writes

Open andygrove opened this issue 2 weeks ago • 3 comments

Which issue does this PR close?

Part of https://github.com/apache/datafusion-comet/issues/1625

Rationale for this change

We would eventually like to support native writes to Parquet. This PR adds a starting point for further development.

This is the result of vibe coding with Claude.

The goal is to add the minium possible implementation and test. There are plenty of things that are not implemented or tested yet.

Example of new native plan:

ParquetWriterExec: path=file:/private/var/folders/vv/fmb1n2hx3yqdmxbrv7shzyvr0000gn/T/spark-79afc322-5315-4f47-85dc-c974eeb44d2c/output.parquet, compression=Snappy
  ScanExec: source=[write_source], schema=[col_0: Int32, col_1: Utf8]

What changes are included in this PR?

  • New native ParquetWriterExec
  • New scala CometNativeWriteExec
  • Updates to CometExecRule
  • One working test

How are these changes tested?

New suite added.

andygrove avatar Nov 21 '25 18:11 andygrove

A good test would be to write with this feature enabled and then read it with and without Comet enabled.

parthchandra avatar Nov 21 '25 19:11 parthchandra

Codecov Report

:x: Patch coverage is 67.47967% with 40 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 59.16%. Comparing base (f09f8af) to head (8d9b41a). :warning: Report is 727 commits behind head on main.

Files with missing lines Patch % Lines
...comet/serde/operator/CometDataWritingCommand.scala 60.86% 17 Missing and 10 partials :warning:
.../apache/spark/sql/comet/CometNativeWriteExec.scala 66.66% 10 Missing and 2 partials :warning:
...n/scala/org/apache/comet/rules/CometExecRule.scala 92.30% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2812      +/-   ##
============================================
+ Coverage     56.12%   59.16%   +3.03%     
- Complexity      976     1477     +501     
============================================
  Files           119      167      +48     
  Lines         11743    15188    +3445     
  Branches       2251     2523     +272     
============================================
+ Hits           6591     8986    +2395     
- Misses         4012     4917     +905     
- Partials       1140     1285     +145     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Nov 21 '25 19:11 codecov-commenter

A good test would be to write with this feature enabled and then read it with and without Comet enabled.

Added. Thanks.

andygrove avatar Nov 21 '25 22:11 andygrove

Thanks for the reviews @comphead and @wForget. I'm going to go ahead and merge this and will have a draft PR up today for file commit protocol.

andygrove avatar Nov 26 '25 13:11 andygrove