JCR-4369: Avoid S3 Incomplete Read Warning by elegant aborting
AWS S3 SDK recommends to abort the S3ObjectInputStream if not intended to consume the data because the http connection pool behavior (by consuming data on close to reuse the connection) of HttpClient under the hood could cause a big performance issue when reading big amount of data. By aborting, it's better to simply abort the underlying HttpRequestBase and kick out the connection from the pool from AWS S3 SDK perspective.
In multi-threaded working environment (due to multiple requests and/or proactiveCaching mode of CachingDataStore), the reading and storing actions in o.a.j.c.data.CachingDataStore.getStream(DataIdentifier) results in falling in the else block of o.a.j.core.data.LocalCache.store(String, InputStream) while the file by the name could already exist when executing the else block. In that case, the S3ObjectInputStream is never read and aborted. As a result, com.amazonaws.services.s3.internal.S3AbortableInputStream#close() ends up complaining about non-aborted/non-read-fully input stream.
Therefore, my fix includes the following:
LocalCachechecks if the backend resource input stream is abortable. If abortable, it tries to abort the backend resource stream. For this purpose,BackendResourceAbortableinterface in jackrabbit-data is introduced.S3Backendwraps theS3ObjectInputStreamto implementBackendResourceAbortableby leveraging commons-io'sProxyInputStream.- Some unit tests.
- Just FYI, also personally tested locally with S3 compatible system (ref: https://github.com/woonsanko/hippo-davstore-demo/tree/feature/vfs-file-system#option-4-using-the-aws-s3-datastore-instead-of-vfs-datastore)