Parth Chandra comments

Results 86 comments of


                                            Parth Chandra

Comet sort order different to Spark for 0.0 and -0.0

> We should also test with NaN in sorting Also `+infinity` and `-infinity` while we are at it.

Making Comet Common Module Engine Independent

Seems to me it would be a step in the right direction. The idea that comet-common should be independent of any engine is sound. It would be a necessary first...

Making Comet Common Module Engine Independent

@advancedxy Good suggestions. I believe this Issue is to address point 3 above while 1 and 2 are in progress.

Making Comet Common Module Engine Independent

> For the Parquet part, we may need to define something like `CometDataType` which gets converted from the Parquet schema, and from which we can derive Spark catalyst data type...

Implement Spark-compatible CAST from String to Date

FWIW, comet cast string to timestamp uses a [format string](https://github.com/apache/datafusion-comet/blob/ef94c554a2907b25ba99f23dbbfb0990cdf2d16c/core/src/execution/datafusion/expressions/cast.rs#L41) that matches the one used by Spark. It still needs to be massaged for trailing zeroes though. Ansi mode was...

Different semantics of casting from int64 to timestamp between Comet and Spark

Also check `CometExpressionSuite.test("cast timestamp and timestamp_ntz")`. This reads timestamps as longs from a parquet file (which may store the values as either millis or micros).

Different semantics of casting from int64 to timestamp between Comet and Spark

> the min value here is actually causing an overflow in Spark Probably because Spark is converting the value from millis to micros?

WIP: Fix performance regression with `stddev` being enabled by default

> I'm not sure this is the best approach but it was previously not possible to have a default value of `false` for `spark.comet.exec.OPNAME.enabled` because it would be enabled anyway...

Implement native version of ColumnarToRow

Posting a reply in case it helps associate the issue somehow. Anyhow, confirming that I am indeed working on this. In that context, I am initially planning to only add...

feat: Implement to_json for subset of types

Sorry @andygrove for this late review. I don't know if one can improve on @eejbyfeldt's review. To address some of the handling of escape characters, should we look at using...