Parth Chandra

Results 28 comments of Parth Chandra

Anyone know why the CI checks are failing with a SocketTimeout exception, and what to do to address this?

I have some numbers from an internal benchmark using Spark. I didn't see any benchmarks in the Parquet codebase that I could reuse. Here are the numbers from my own...

> Great effort! WIll have a look after the build succeed. @shangxinli I have no idea how to get the failed CI to pass. These failures appear to be in...

> @parthchandra Would you mind having a look at my I/O performance optimization plan for ParquetMR? I think we should coordinate, since we have some ideas that might overlap what...

> @parthchandra One thing that confuses me a bit is that these buffers have only ByteBuffer inside them. There's no actual I/O, so it's not possible to block. Do you...

@theosib-amazon I applied my PR on top of your PR, ran thru some tests using Spark, and hit no issues. (All unit tests passed as well).

@steveloughran thank you very much for taking the time to review and provide feedback! > 1. whose s3 client was used for testing here -if the s3a one, which hadoop...

> thanks., that means you are current with all shipping improvments. the main one extra is to use openFile(), passing in length and requesting randomio. this guarantees ranged GET requests...

> Is byte (and arrays and buffers of bytes) the only datatype you support? My PR is optimizing code paths that pull ints, longs, and other sizes out of the...

> Latency is the killer; in an HTTP request you want read enough but not discard data or break an http connection if the client suddenly does a seek() or...