AWS-iGenomes
AWS-iGenomes copied to clipboard
Variation vcf missing from GRCh38
As the title says, it looks like NCBI GRCh38 does not contain a VCF. Is that intentional?
Hi @johanneskoester,
Thanks for letting me know - it's not intentional, no. I'll take a look...
Phil
Also, GRCh38 is not available at all from Ensembl it seems. Only NCBI. Is that not intentional as well?
Also, GRCh38 is not available at all from Ensembl it seems. Only NCBI. Is that not intentional as well?
That's because it's only supported like that in the original iGenomes resource: https://support.illumina.com/sequencing/sequencing_software/igenome.html
...and yes, you're hitting the main problem with this repo. I didn't really intend to make myself warden of all of this data, more just to mirror the existing iGenomes resource.
I see. Thanks!
Hi @johanneskoester,
Apologies if that was a little curt - not my intention! I was on holiday and replying on my phone at the time. But yes - iGenomes is mostly just a mirror of the illumina resource currently. Is there a specific VCF that you think would be most appropriate to add? I'd be happy to put it up if so.
I'll try and find a moment to put together the Ensembl build at some point, though I have a bit of a backlog of work at the moment so it could take me some time to get there. If you'd like it done more quickly then I can easily sync iGenomes with another s3 / FTP source somewhere...
Cheers,
Phil
All fine! Ideally, it would be nice to have all these: http://www.ensembl.org/info/data/ftp/index.html, for human and mice at least.
Agreed! It's a shame that there can't be a direct s3 mirror of all ensembl reference data in fact.. I wonder how much data it would be for the whole 153 genomes. 🤔
@MaxUlysse and @alneberg are going to start taking a look at adding some ensembl GRCh38 references to AWS-iGenomes..