go-ds-s3 icon indicating copy to clipboard operation
go-ds-s3 copied to clipboard

feat: optimize keys for S3 performance

Open mikeal opened this issue 4 years ago • 1 comments

When I was doing the Dumbo Drop project I hit most of the performance bottlenecks you can find in S3 and Dynamo. One thing I stumbled upon was a much better pattern for storing IPLD blocks in S3.

From the aws documentation.

your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket

This statement isn’t 100% honest. Most of the time you will not truly see this performance against every prefix, but it’s a good window into how S3 is architected and what the performance constraints are.

We’re in a very lucky situation, we can really optimize for this because every block already has a randomized key you can use as a prefix. I’ve recently built two block storage backends for IPLD and in both cases used the CID as a prefix rather than the final key, so something like {cid.toString()}/data and the performance I was able to get was tremendous.

If you really hammer a bucket with writes this way, you’ll see moments in which it’s re-balancing in order to get more throughput. Once I had a few billion blocks in a single bucket I aimed 2000+ lambda functions at the same bucket writing 1MB blocks and Lambda started having issues before I could saturate the bucket which was reliably doing about 40GB/s write throughput.

This library, and any other IPFS/IPLD storage backends for S3, should probably take the same approach.

mikeal avatar Jun 29 '20 20:06 mikeal

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

welcome[bot] avatar Jun 29 '20 20:06 welcome[bot]