delly
delly copied to clipboard
Is it possible to access an alignment file from an S3 bucket?
Hi all, I would like to read an alignment file from an S3 bucket like so:
delly call -x /work/home/human.hg38.excl.tsv -o delly.bcf -g /work/home/GRCh38_full_analysis_set_plus_decoy_hla.fa s3://<path>
Is this feasible? My understanding is that samtools allows for one to read directly from an S3 bucket. If it's being used under the hood, how easy would it be to allow for this functionality?
Sincerely, James Guevara
Hi James,
Delly doesn't directly support S3 buckets but you can mount an S3 bucket as a file system using goofys. I assume the performance is similar but I haven't done any benchmarking with samtools, for instance.
Best, Tobias
Hi Tobias, thanks for the response! I have a few questions pertaining to the suggestion that we mount the S3 bucket as a file system (using goofys in this case):
- Can I mount an S3 bucket that belongs to another group? I have the AWS credentials to access that S3 bucket.
- What are the costs associated with mounting an S3 bucket (e.g. if I were to mount a filesystem with ~1000 WGS CRAM files which are on average 30GB each, is there some rough estimate of the cost to keep this filesystem mounted?)?
- If I manage to mount an S3 bucket as a file system, would I be able to easily use the folders inside that file system as volumes inside a Docker container (I recall reading this was a problem, but I'm not sure)?
For question 2, here is what I read but perhaps I'm misinterpreting (and they were using s3fs, not goofys): https://serverfault.com/questions/780247/aws-symlink-directory-in-ec2-to-s3-bucket
WARNING You are billed in S3 based on the number of request you make. I did not do any further research, but my S3 usage spiked when I did this. I believe that the connections maintained by the mounted S3 bucket was counted as a GET/PUT/LIST request.
Thanks again for the response, and apologies if these other questions are outside the bounds of the initial issue. If so, feel free to close the issue.
I haven't benchmarked on AWS so I can't comment on the billing aspects. For your questions: (1) Should be fine. (2) I don't know. (3) Instead of mounting an S3 bucket in your host system and then mounting it into the container you could also use goofys from inside the container, i.e., mounting the S3 bucket inside the container.
Great, thanks a lot for the information! I suppose I'll look into that option more closely.
Thanks, would be great if you can post your experiences here afterwards.