Parth Chandra
Parth Chandra
Anyone know why the CI checks are failing with a SocketTimeout exception, and what to do to address this?
I have some numbers from an internal benchmark using Spark. I didn't see any benchmarks in the Parquet codebase that I could reuse. Here are the numbers from my own...
> Great effort! WIll have a look after the build succeed. @shangxinli I have no idea how to get the failed CI to pass. These failures appear to be in...
> @parthchandra Would you mind having a look at my I/O performance optimization plan for ParquetMR? I think we should coordinate, since we have some ideas that might overlap what...
> @parthchandra One thing that confuses me a bit is that these buffers have only ByteBuffer inside them. There's no actual I/O, so it's not possible to block. Do you...
@theosib-amazon I applied my PR on top of your PR, ran thru some tests using Spark, and hit no issues. (All unit tests passed as well).
@steveloughran thank you very much for taking the time to review and provide feedback! > 1. whose s3 client was used for testing here -if the s3a one, which hadoop...
> thanks., that means you are current with all shipping improvments. the main one extra is to use openFile(), passing in length and requesting randomio. this guarantees ranged GET requests...
> Is byte (and arrays and buffers of bytes) the only datatype you support? My PR is optimizing code paths that pull ints, longs, and other sizes out of the...
> Latency is the killer; in an HTTP request you want read enough but not discard data or break an http connection if the client suddenly does a seek() or...