AWS-iGenomes icon indicating copy to clipboard operation
AWS-iGenomes copied to clipboard

Variation vcf missing from GRCh38

Open johanneskoester opened this issue 7 years ago • 9 comments

As the title says, it looks like NCBI GRCh38 does not contain a VCF. Is that intentional?

johanneskoester avatar Aug 28 '18 20:08 johanneskoester

Hi @johanneskoester,

Thanks for letting me know - it's not intentional, no. I'll take a look...

Phil

ewels avatar Aug 31 '18 16:08 ewels

Also, GRCh38 is not available at all from Ensembl it seems. Only NCBI. Is that not intentional as well?

johanneskoester avatar Aug 31 '18 18:08 johanneskoester

Also, GRCh38 is not available at all from Ensembl it seems. Only NCBI. Is that not intentional as well?

That's because it's only supported like that in the original iGenomes resource: https://support.illumina.com/sequencing/sequencing_software/igenome.html

ewels avatar Oct 08 '18 18:10 ewels

...and yes, you're hitting the main problem with this repo. I didn't really intend to make myself warden of all of this data, more just to mirror the existing iGenomes resource.

ewels avatar Oct 08 '18 18:10 ewels

I see. Thanks!

johanneskoester avatar Oct 09 '18 08:10 johanneskoester

Hi @johanneskoester,

Apologies if that was a little curt - not my intention! I was on holiday and replying on my phone at the time. But yes - iGenomes is mostly just a mirror of the illumina resource currently. Is there a specific VCF that you think would be most appropriate to add? I'd be happy to put it up if so.

I'll try and find a moment to put together the Ensembl build at some point, though I have a bit of a backlog of work at the moment so it could take me some time to get there. If you'd like it done more quickly then I can easily sync iGenomes with another s3 / FTP source somewhere...

Cheers,

Phil

ewels avatar Nov 17 '18 21:11 ewels

All fine! Ideally, it would be nice to have all these: http://www.ensembl.org/info/data/ftp/index.html, for human and mice at least.

johanneskoester avatar Nov 27 '18 16:11 johanneskoester

Agreed! It's a shame that there can't be a direct s3 mirror of all ensembl reference data in fact.. I wonder how much data it would be for the whole 153 genomes. 🤔

ewels avatar Nov 27 '18 16:11 ewels

@MaxUlysse and @alneberg are going to start taking a look at adding some ensembl GRCh38 references to AWS-iGenomes..

ewels avatar Jan 11 '19 12:01 ewels