Andy Grove comments

Results 657 comments of


                                            Andy Grove

Add Comet to H2O.ai benchmark

> JFYI, I'm a maintainer of the (unofficial) library for generating datasets for H2O benchmark. You can use it instead of the R-scripts from the official repository to generate datasets...

Stop running Rust tests in CI for all Java and Spark versions

Through working on this, I discovered that some of the Rust tests actually do interact with the JVM, so we probably do still need to test with different Java versions....

Experimental native scan test failures

One more for the list: https://github.com/apache/datafusion-comet/issues/1488

Experimental native scan test failures

We now have more detailed issues tracking failures for each scan type, so I think we can close this issue now. - https://github.com/apache/datafusion-comet/issues/1542 - https://github.com/apache/datafusion-comet/issues/1545

How can Comet be enabled by default without needing to configure memory?

> When enabling off-heap memory, we will use unified memory manager, does that mean the amount of memory will not be doubled? Our current implementation automatically allocates extra memory when...

How can Comet be enabled by default without needing to configure memory?

My opinion is that auto-tuning Spark configuration (with or without Comet) requires running the jobs with different configs and learning from the impact. I don't think that we can simply...

Optimize char expression

I do see a small performance improvement with these changes: ``` char time: [9.9009 µs 9.9636 µs 10.036 µs] change: [-3.3878% -2.8089% -2.2840%] (p = 0.00 < 0.05) Performance has...

feat: Implement bloom_filter_agg

I tested with TPC-H q5 and see that we are now running the bloom filter agg natively

Could not deserialize ballista_core::serde::generated::ballista::JobStatus (invalid wire type)

May have been triggered by this: ``` ballista-executor_1 | 2022-10-21T00:40:12.765327Z DEBUG task_runner ThreadId(85) ballista_executor::execution_loop: Statistics: Err(DataFusionError(ArrowError(InvalidArgumentError("46412468576820000 is too large to store in a Decimal128 of precision 15. Max is 999999999999999"))))...

Panic in `datafusion_expr::window_state::WindowAggState::update`

I also see a correctness issue in another test related to windowed aggregates: ``` 2025-06-06T14:15:31.2495550Z [info] - postgreSQL/window_part1.sql *** FAILED *** (4 seconds, 628 milliseconds) 2025-06-06T14:15:31.2496326Z [info] postgreSQL/window_part1.sql 2025-06-06T14:15:31.2496774Z [info]...