Parth Chandra

Results 78 comments of Parth Chandra

> @danielcweeks that's a good point about pluggability. > I don't know if that would be useful for iceberg https://github.com/apache/hadoop-api-shim Iceberg can use the base Parquet File reader out of...

@mukund-thakur @steveloughran this is a great PR! Some numbers from an independent benchmark. I used Spark to parallelize the reading of all rowgroups (just the reading of the raw data)...

@ahmarsuhail No these numbers are not with iceberg and S3FileIO. I used a modified (lots of stuff removed) version of the ParquetFileReader and a custom benchmark program that reads all...

> * for parquet, we do the same validation so behaviour is consistent across all impls I think that is the correct behaviour. > * If there is an overlap,...

> Per the [thread](https://lists.apache.org/thread/kttwbl5l7opz6nwb5bck2gghc2y3td0o), it'd be good to have this patch in 1.14.0 :) Otherwise, can take a very long time till the next one.. Note that this PR is...

> Now, what do people think about fallbacks? we've been using this and to date parquet hasn't ever issued an overlapping request, but there's still the future to think about....

@steveloughran @mukund-thakur do you guys have any information on how much (if any) this impacts the peak memory utilization in the parquet file reader? The total memory allocated while reading...

@steveloughran are you planning to incorporate the read metrics added in https://github.com/apache/parquet-mr/commit/2e0cd1925546d2560f7658086251851e6fa68559 ? I can add them after this is merged so as not to hold up this PR.

> * "-0973250", "-3638-5" fuzz tests in Legacy mode should return values as mentioned [Implement Spark-compatible CAST from String to Date #327](https://github.com/apache/datafusion-comet/issues/327) - currently legacy mode returns null. You're pretty...

Is this an issue of just a mismatch between error messages? Or is the cast actually not doing the right thing with Spark 3.2?