physionet-build icon indicating copy to clipboard operation
physionet-build copied to clipboard

S3: restricted access bucket/policy structure

Open bemoody opened this issue 10 months ago • 10 comments

We want to provide mirrors of our published projects on Amazon S3 (similar to our existing Google Cloud mirrors) for purposes of providing faster and more convenient access.

For restricted access projects, we ideally want to allow each authorized individual to access the files via S3 API (so they can retrieve the data from Amazon directly, rather than going through our server to authorize each request.)

Amazon has a very low limit on the size of the encoded policy document (a JSON file) that can be attached to a single bucket. Small enough that it is very likely there will be more MIMIC users than space to encode their Amazon account IDs.

https://stackoverflow.com/a/75936718

After contacting AWS support on this it turned out that:

  1. 20kB policy size limit is still valid and enforced.

  2. AWS is doing some policy normalizations which reduce the total size of the policy, so the total size of the policy in characters may be bigger then what JSON.stringify(policy) may report.

  3. To my understanding there are no means to calculate the size of the normalized policy besides contacting AWS S3 support. If JSON.stringify(policy).length < 20kB then you are safe, else consider a workaround soon.

  4. Using multiple S3 Access Points is one of the ways how to bypass the policy size limits.

What are "S3 Access Points"? How do we create them? Is there a difference from a client's perspective between loading data from a shared bucket, versus loading data from a shared Access Point?

bemoody avatar Sep 27 '23 17:09 bemoody