Parth Chandra comments

Results 78 comments of


                                            Parth Chandra

Use maven-assembly-plugin to set final artifact name

This was taken from Spark which has corrected it since

Improve performance of TPC-H q19

> * The Parquet scan of lineitem seems to take ~10% longer than Spark and 60%+ of the time is spent in native decoding, so perhaps we should add criterion...

address failure caused by method signature change in SPARK-48791

Seems like a perennial issue. This signature changes in every release it appears (it is private after all). https://github.com/apache/datafusion-comet/issues/1576

Dropping Spark 3.3 support

There are a couple of considerations here - 1) What version of Spark users are likely to be on (and therefore likely to want to use Comet with)? 2) What...

Improve performance of TPC-DS q72

Spark produces the worst possible query plan for q72 which amplifies the difference in performance. The C2R overhead for comet is amplified because the conversion happens on a dataset that...

Comet cannot read decimals with physical type BINARY

I'll look into this @comphead.

Comet cannot read decimals with physical type BINARY

Update on this - Spark vectorized reader also throws the same error. Users have to turn off vectorized reading to read such files. It is also pretty near impossible to...

Comet cannot read decimals with physical type BINARY

Yes, let's close this. We can revisit this if more people report it.

bug: CAST timestamp to string ignores timezone prior to Spark 3.4

IIRC there were differences in output between Spark 3.2 and Spark 3.4 for the timestamp_ntz type. Taking a closer look, the definition of timestamp_ntz (in Spark) essentially means that the...

Support DELTA_BINARY_PACKED and DELTA_BYTE_ARRAY

IIRC, the vectorized versions of these encodings in Spark did not improve performance much over the row based implementation in the parquet library