datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

feat: Implement to_json for subset of types

Open andygrove opened this issue 1 year ago • 7 comments

Which issue does this PR close?

Closes https://github.com/apache/datafusion-comet/issues/631

Rationale for this change

This is part of our effort to support more operations on complex types.

Performance seems ok.

AMD Ryzen 9 7950X3D 16-Core Processor
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
to_json                                             147            152           2          1.4         721.3       1.0X
to_json: Comet (Scan)                               141            145           3          1.4         692.0       1.0X
to_json: Comet (Scan, Exec)                          80             88           5          2.6         391.9       1.8X

What changes are included in this PR?

This PR adds an implementation of to_json that works for a subset of types (structs, primitives, strings). There is no support for date or timestamp yet.

How are these changes tested?

New Rust tests and Spark tests.

andygrove avatar Aug 10 '24 22:08 andygrove

Codecov Report

Attention: Patch coverage is 50.00000% with 16 lines in your changes missing coverage. Please review.

Project coverage is 33.82%. Comparing base (4fe43ad) to head (5c2f551).

Files Patch % Lines
.../scala/org/apache/comet/serde/QueryPlanSerde.scala 50.00% 9 Missing and 7 partials :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #805      +/-   ##
============================================
- Coverage     33.94%   33.82%   -0.12%     
+ Complexity      874      871       -3     
============================================
  Files           112      112              
  Lines         42916    42887      -29     
  Branches       9464     9456       -8     
============================================
- Hits          14567    14508      -59     
- Misses        25379    25395      +16     
- Partials       2970     2984      +14     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Aug 11 '24 05:08 codecov-commenter

@parthchandra could you review?

andygrove avatar Aug 12 '24 13:08 andygrove

i would also love to review this. will plan it for tomorrow

dharanad avatar Aug 13 '24 18:08 dharanad

@eejbyfeldt @Kimahriman you may also be interested in reviewing this one

andygrove avatar Aug 13 '24 18:08 andygrove

LGTM

Kimahriman avatar Aug 13 '24 18:08 Kimahriman

Thanks for the review @eejbyfeldt! It is really appreciated. Some very good feedback there. I will address the feedback over the next day or two.

andygrove avatar Aug 13 '24 23:08 andygrove

Sorry @andygrove for this late review. I don't know if one can improve on @eejbyfeldt's review. To address some of the handling of escape characters, should we look at using something like serde_json ?

parthchandra avatar Aug 16 '24 18:08 parthchandra

@parthchandra @eejbyfeldt This is ready for another review

andygrove avatar Aug 26 '24 15:08 andygrove