delly icon indicating copy to clipboard operation
delly copied to clipboard

Is it possible to access an alignment file from an S3 bucket?

Open james-guevara opened this issue 2 years ago • 5 comments

Hi all, I would like to read an alignment file from an S3 bucket like so:

delly call -x /work/home/human.hg38.excl.tsv -o delly.bcf -g /work/home/GRCh38_full_analysis_set_plus_decoy_hla.fa s3://<path>

Is this feasible? My understanding is that samtools allows for one to read directly from an S3 bucket. If it's being used under the hood, how easy would it be to allow for this functionality?

Sincerely, James Guevara

james-guevara avatar Apr 26 '22 01:04 james-guevara

Hi James,

Delly doesn't directly support S3 buckets but you can mount an S3 bucket as a file system using goofys. I assume the performance is similar but I haven't done any benchmarking with samtools, for instance.

Best, Tobias

tobiasrausch avatar Apr 26 '22 07:04 tobiasrausch

Hi Tobias, thanks for the response! I have a few questions pertaining to the suggestion that we mount the S3 bucket as a file system (using goofys in this case):

  1. Can I mount an S3 bucket that belongs to another group? I have the AWS credentials to access that S3 bucket.
  2. What are the costs associated with mounting an S3 bucket (e.g. if I were to mount a filesystem with ~1000 WGS CRAM files which are on average 30GB each, is there some rough estimate of the cost to keep this filesystem mounted?)?
  3. If I manage to mount an S3 bucket as a file system, would I be able to easily use the folders inside that file system as volumes inside a Docker container (I recall reading this was a problem, but I'm not sure)?

For question 2, here is what I read but perhaps I'm misinterpreting (and they were using s3fs, not goofys): https://serverfault.com/questions/780247/aws-symlink-directory-in-ec2-to-s3-bucket

WARNING You are billed in S3 based on the number of request you make. I did not do any further research, but my S3 usage spiked when I did this. I believe that the connections maintained by the mounted S3 bucket was counted as a GET/PUT/LIST request.

Thanks again for the response, and apologies if these other questions are outside the bounds of the initial issue. If so, feel free to close the issue.

james-guevara avatar Apr 26 '22 18:04 james-guevara

I haven't benchmarked on AWS so I can't comment on the billing aspects. For your questions: (1) Should be fine. (2) I don't know. (3) Instead of mounting an S3 bucket in your host system and then mounting it into the container you could also use goofys from inside the container, i.e., mounting the S3 bucket inside the container.

tobiasrausch avatar Apr 28 '22 09:04 tobiasrausch

Great, thanks a lot for the information! I suppose I'll look into that option more closely.

james-guevara avatar Apr 28 '22 22:04 james-guevara

Thanks, would be great if you can post your experiences here afterwards.

tobiasrausch avatar Apr 29 '22 07:04 tobiasrausch