[feature] Add caching mechanism

Open jszendre opened this issue 2 years ago • 1 comments

At Intuit we are considering adding independent s3 proxy servers for various AWS roles. One of the use cases would be for fielding requests from Spark's S3A connector and returning a more consistent view of the data after create / rename / delete / update operations.

Could we add a pattern for users to specify a caching mechanism? A user could specify a struct that implements an interface for interacting with the cache. One example cache could be with a minio backend.

Thanks

Aug 01 '23 17:08 jszendre

Hi @jszendre,

I'm not sure I understood your request.

But here's, from my understanding, how I see it:

This repository is a proxy which makes it possible for apps to interact with S3 without needing the AWS SDK nor any credentials.

Here's a diagram showcasing how this works in the Mirakl (my company) context:

sequenceDiagram
	participant app as your app
	participant lib as lib-xfiles
	participant xfiles as s3proxy
	participant s3 as S3 Bucket

	app ->> lib: XFilesService.fetchBlob(bucket, key)
	lib ->> xfiles: GET /api/v1/presigned/url/<bucket>/<key>
	xfiles -->> lib: 200 OK {url: "https://google/...."}
	lib ->> s3: GET <url>
	s3 -->> lib: OK <data>
	lib -->> app: OK <data>

As you can see, this is a custom flow we "invented", and we also have an in-house library lib-xfiles to handle this flow.

However, in your use case, i.e. Spark S3 Adapter: this uses the AWS SDK directly to interact with the S3 API. And as such, it cannot transparently use s3proxy.

But maybe I misunderstood your request entirely, in which case, do no hesitate to explain it more plainly ?

Thanks.

Aug 02 '23 09:08 jawher