cloud-volume
cloud-volume copied to clipboard
For estimating the cost for data upload to AWS S3
Hi there,
AWS S3 storage comes with the cost for uploading the data to their storage. I realized CloudVolume is compatible with direct upload to the S3 storage if I'm not mistaken. When estimating the cost, should I be calculating based on the number of files being uploaded or number of POST/PUT requests made to the S3 storage?
Precomputed volumes are typically stored on AWS S3, Google Storage, or locally. CloudVolume can read and write to these object storage providers given a service account token with appropriate permissions. However, these volumes can be stored on any service, including an ordinary webserver or local filesystem, that supports key-value access.
This is what AWS is providing for pricing estimation. https://aws.amazon.com/s3/pricing/
Thank you, -m
Hi m,
I believe it is the number of PUT requests (Which should be ~1:1 with the number of files except for failed uploads? Not sure if they charge for error codes.) Remember to account for storage cost, number of writes, and any inter-region egress and egress to internet charges later on. Usually these cloud providers have free ingress. No warranty on this advice though! It's always possible I've missed something.
Here's some example calculations for running a process on Google. The AWS calculations are pretty similar. https://github.com/seung-lab/kimimaro/wiki/The-Economics:-Skeletons-for-the-People/c2d4e28645e96d3e963f7338a46d15dc3890c553
Thank you for the inputs @william-silversmith !! Wow, it does need to be carefully processed for production ready chunks to be uploaded. What if the scenario is to compute on a local machine, (and not directly writing to the cloud), should I be looking at the following costs for upload ready files? Let's say the cloud provider provides a CLI to upload these dataset.
Assuming image chunks stored as 128x128x64:
15.6 TVx / (128x128x64 voxels/file) = 14.9 million files 14.9e6 files * ($4 per ten million files) = $5.95
Assume segmentation labels are fractured into about 1.5 billion fragments after chunking: 2.3B PUT requests * 5 $/million PUTs = $11,500
neurodata.io seems to use s3 cloud storage. Do you know if they use CloudVolume to upload? https://neurodata.io/project/ocp/
neurodata.io seems to use s3 cloud storage. Do you know if they use CloudVolume to upload?
To my knowledge they've been using s3 though at one point they were considering Azure. I believe they use CV to upload, but I can't be sure. They recommend using CloudVolume to download on their site.
What if the scenario is to compute on a local machine, (and not directly writing to the cloud), should I be looking at the following costs for upload ready files? Let's say the cloud provider provides a CLI to upload these dataset.
You should check to see if those are the current prices for s3 yourself, but the first line refers to reading the entire segmentation and the second line refers to writing all the skeleton fragments. If you're just reading and writing images using a reasonable chunk size, you should be okay. What kind of job are you running and what size is it (approximately if need be)? It would helpful for giving better insight.
Hi @william-silversmith ,
Yes, definitely I would need to revisit the current prices for s3. To better understand the illustrated example, what was the original file size of the raw data? Has the file been downsampled?
1 Petavoxel = 200,000 x 200,000 x 25,000 voxels at 4x4x40 nm resolution
In the example given, the file was downsampled to mip 3 (hence 15.6 TVx). If you're concerned about the generation of meshes/skeletons, there's some good news. Skeletons has gotten a lot better (that's an old article) by using the sharded format. Meshes are still under development.
Here's the updated article: https://github.com/seung-lab/kimimaro/wiki/The-Economics:-Skeletons-for-the-People