Andrew Lamb

Results 2001 comments of Andrew Lamb

> Do some changes in arrow-rs clickbench benchmark: Do I understand that the changes you report are due to simply rewriting the parquet files to have a page index and...

I posted about this on twitter too in case anyone is interested: https://x.com/andrewlamb1111/status/1929852296323547273

Here is a related blog about doing this on clickhouse: - https://altinity.com/blog/the-future-has-arrived-parquet-on-iceberg-finally-outperforms-mergetree

> I am also curious: **Why would uncompressed Parquet be considered an optimization over Snappy-compressed Parquet?** Is the decompression overhead of Snappy significant enough to slow down read performance? Yes,...

Using pyspark to generate expected input/output that gets checked in. sounds like a great idea to me BTW i hope to devote some time next week to helping organize this...

I just had a chat with @shehabgamin The current status is that we have not smoothed out the process to the point where contributors with minimal context can pick up...

> We add a CI workflow that is triggered when Spark functions SLT files are changed, to make sure they are generated without unintended manual modification. I am not quite...

An update here is that we are waiting for one or two more good example PRs and then we'll turn the community on porting If anyone wants to take a...

> There are tests in DataFusion ready to go as well! I filed two tickets to track this suggestion and marked them as good first issues - https://github.com/apache/datafusion/issues/16774 - https://github.com/apache/datafusion/issues/16775

> Hi [@alamb](https://github.com/alamb), I just wanted to clarify: if a Spark function appears in the sqllogictest tests, are we expected to implement it in DataFusion? @Standing-Man that is the intention....