Andy Grove

Results 657 comments of Andy Grove

> JFYI, I'm a maintainer of the (unofficial) library for generating datasets for H2O benchmark. You can use it instead of the R-scripts from the official repository to generate datasets...

Through working on this, I discovered that some of the Rust tests actually do interact with the JVM, so we probably do still need to test with different Java versions....

One more for the list: https://github.com/apache/datafusion-comet/issues/1488

We now have more detailed issues tracking failures for each scan type, so I think we can close this issue now. - https://github.com/apache/datafusion-comet/issues/1542 - https://github.com/apache/datafusion-comet/issues/1545

> When enabling off-heap memory, we will use unified memory manager, does that mean the amount of memory will not be doubled? Our current implementation automatically allocates extra memory when...

My opinion is that auto-tuning Spark configuration (with or without Comet) requires running the jobs with different configs and learning from the impact. I don't think that we can simply...

I do see a small performance improvement with these changes: ``` char time: [9.9009 µs 9.9636 µs 10.036 µs] change: [-3.3878% -2.8089% -2.2840%] (p = 0.00 < 0.05) Performance has...

I tested with TPC-H q5 and see that we are now running the bloom filter agg natively

May have been triggered by this: ``` ballista-executor_1 | 2022-10-21T00:40:12.765327Z DEBUG task_runner ThreadId(85) ballista_executor::execution_loop: Statistics: Err(DataFusionError(ArrowError(InvalidArgumentError("46412468576820000 is too large to store in a Decimal128 of precision 15. Max is 999999999999999"))))...

I also see a correctness issue in another test related to windowed aggregates: ``` 2025-06-06T14:15:31.2495550Z [info] - postgreSQL/window_part1.sql *** FAILED *** (4 seconds, 628 milliseconds) 2025-06-06T14:15:31.2496326Z [info] postgreSQL/window_part1.sql 2025-06-06T14:15:31.2496774Z [info]...