Steve Loughran

Results 315 comments of Steve Loughran

added comments, especially on a builder api. one option to be able to set would be whether the footer was required or not. as gcs/abfs will both fetch and cache...

> For instance Iceberg has this [S3InputFile](https://github.com/apache/iceberg/blob/master/aws/src/main/java/org/apache/iceberg/aws/s3/S3InputFile.java) which knows nothing about file status or read policy. the builder api we use has opt(key, val) being something implementations can ignore, must(key,...

> what API is missing from hadoop-azure, and what you would plan to change in hadoop-azure. problem is that @Tom-Newton wants to be able to do a list from a...

> from the hadoop-azure side. more "from the entire hadoop list api". Which doesn't mean its intractable, just makes it harder. FWIW, I do think the idea "set a starting...

yeah, you could; look at HDFS-13616 as the last time anyone went near listing and HADOOP-16898 as my attempt to keep the HDFS-first features under control. the newer apis (openFile(),...

> For S3 we use S3ListRequest which is also not public going to break your code with a move to 3.4 and the v2 AWS SDK I'm afraid. how about...

Could we actually have a specific subclass of SdkClientException for these retryable signing/hashing problems? The Hadoop S3A client already splits failures into those which may be recoverable (no response, throttle...

FYI as HADOOP-19221 shows, v2 SDK actually makes things worse in terms of s3 upload recoverability.

has anyone set up a nightly jenkins with stable spark and its tests set to run off a nightly build of parquet? would seem a good way to catch regressions...

those http system property settings are picked up by the java.net httpclient; aws sdk uses apache httpclient which has *never* picked them up.