Parth Chandra

Results 86 comments of Parth Chandra

> We should also test with NaN in sorting Also `+infinity` and `-infinity` while we are at it.

Seems to me it would be a step in the right direction. The idea that comet-common should be independent of any engine is sound. It would be a necessary first...

@advancedxy Good suggestions. I believe this Issue is to address point 3 above while 1 and 2 are in progress.

> For the Parquet part, we may need to define something like `CometDataType` which gets converted from the Parquet schema, and from which we can derive Spark catalyst data type...

FWIW, comet cast string to timestamp uses a [format string](https://github.com/apache/datafusion-comet/blob/ef94c554a2907b25ba99f23dbbfb0990cdf2d16c/core/src/execution/datafusion/expressions/cast.rs#L41) that matches the one used by Spark. It still needs to be massaged for trailing zeroes though. Ansi mode was...

Also check `CometExpressionSuite.test("cast timestamp and timestamp_ntz")`. This reads timestamps as longs from a parquet file (which may store the values as either millis or micros).

> the min value here is actually causing an overflow in Spark Probably because Spark is converting the value from millis to micros?

> I'm not sure this is the best approach but it was previously not possible to have a default value of `false` for `spark.comet.exec.OPNAME.enabled` because it would be enabled anyway...

Posting a reply in case it helps associate the issue somehow. Anyhow, confirming that I am indeed working on this. In that context, I am initially planning to only add...

Sorry @andygrove for this late review. I don't know if one can improve on @eejbyfeldt's review. To address some of the handling of escape characters, should we look at using...