Andy Grove comments

Results 657 comments of


                                            Andy Grove

[EPIC] Complete `datafusion-spark` Spark Compatible Functions

I created an epic in Comet to track donating the remaining Spark expressions: https://github.com/apache/datafusion-comet/issues/2084

[BUG] GPU JSON reader fails to read the JSON string of an empty body

I just re-tested this, and it is still an issue even after switching to the new engine. ``` scala> spark.read.json("no-body.json").show 24/01/17 00:02:09 WARN GpuOverrides: !Exec cannot run on GPU because...

[BUG] GPU JSON reader fails to read the JSON string of an empty body

This only seems to be an issue for a JSON file that only contains empty entries. If there is at least one non-empty row, then we match Spark. ``` $...

chore: Reserve memory for native shuffle writer per partition

I'm testing this PR out now, in conjunction with some other PRs because I currently have a reproducible deadlock caused by memory pool issues, as far as I can tell.

Handle tz, datetime & int96 rebase configurations in native_iceberg_compat

Can we fallback to Spark (or another reader) for now if any of these configs are set to values that we do not yet honor?

Is it possible to support PyArrow backed UDFs in Comet natively?

Thanks for writing this up @SemyonSinchenko. Your reasoning seems sound to me, and I agree that is would be quite a unique and powerful feature for Comet. I am not...

Is it possible to support PyArrow backed UDFs in Comet natively?

> It seems to me that re-using how spark handle python UDFs would be easier than implementing it from scratch using datafusion. But I'm not 100% sure. Yes, that is...

Is it possible to support PyArrow backed UDFs in Comet natively?

@SemyonSinchenko I started hacking on a solution for this in https://github.com/andygrove/datafusion-comet/tree/pyarrow-wip by adding a CometArrowPythonRunner. I plan on working on this as a low-priority task when I have time, but...

Fix job hangs when partition count of plan is zero

Thanks @lewiszlw. Could you rebase/upmerge (to fix the CI failure) then I can review

`native_datafusion/native_iceberg_compat` scans case sensitive

These Spark SQL test failures with `native_iceberg_compat` are possible related to this issue: - Spark native readers should respect spark.sql.caseSensitive - parquet *** FAILED *** (440 milliseconds) - SPARK-31116: Select...