Andrew Lamb comments

Results 2001 comments of


                                            Andrew Lamb

Blog post about parquet vs custom file formats

> Do some changes in arrow-rs clickbench benchmark: Do I understand that the changes you report are due to simply rewriting the parquet files to have a page index and...

Blog post about parquet vs custom file formats

I posted about this on twitter too in case anyone is interested: https://x.com/andrewlamb1111/status/1929852296323547273

Blog post about parquet vs custom file formats

Here is a related blog about doing this on clickhouse: - https://altinity.com/blog/the-future-has-arrived-parquet-on-iceberg-finally-outperforms-mergetree

Blog post about parquet vs custom file formats

> I am also curious: **Why would uncompressed Parquet be considered an optimization over Snappy-compressed Parquet?** Is the decompression overhead of Snappy significant enough to slow down read performance? Yes,...

[EPIC] Complete `datafusion-spark` Spark Compatible Functions

Using pyspark to generate expected input/output that gets checked in. sounds like a great idea to me BTW i hope to devote some time next week to helping organize this...

[EPIC] Complete `datafusion-spark` Spark Compatible Functions

I just had a chat with @shehabgamin The current status is that we have not smoothed out the process to the point where contributors with minimal context can pick up...

[EPIC] Complete `datafusion-spark` Spark Compatible Functions

> We add a CI workflow that is triggered when Spark functions SLT files are changed, to make sure they are generated without unintended manual modification. I am not quite...

[EPIC] Complete `datafusion-spark` Spark Compatible Functions

An update here is that we are waiting for one or two more good example PRs and then we'll turn the community on porting If anyone wants to take a...

[EPIC] Complete `datafusion-spark` Spark Compatible Functions

> There are tests in DataFusion ready to go as well! I filed two tickets to track this suggestion and marked them as good first issues - https://github.com/apache/datafusion/issues/16774 - https://github.com/apache/datafusion/issues/16775

[EPIC] Complete `datafusion-spark` Spark Compatible Functions

> Hi [@alamb](https://github.com/alamb), I just wanted to clarify: if a Spark function appears in the sqllogictest tests, are we expected to implement it in DataFusion? @Standing-Man that is the intention....