tools
tools copied to clipboard
RefGenie - add assets currently used from AWS-iGenomes
Split from #592, related directly to #1084
We want to transition away from using AWS-iGenomes to using the central RefGenie server(s) to host reference genome assets for the nf-core pipelines. To do this, we need to make sure that everything we currently have is available on a AWS refgenie server somewhere. Once all assets are mirrored we can do a clean swap in the config.
Going from igenomes.config
, here is a check-list:
Summary of genomes
- [ ] GRCh37
- [ ] GRCh38
- [ ] GRCm38
- [ ] TAIR10
- [ ] EB2
- [ ] UMD3.1
- [ ] WBcel235
- [ ] CanFam3.1
- [ ] GRCz10
- [ ] BDGP6
- [ ] EquCab2
- [ ] EB1
- [ ] Galgal4
- [ ] Gm01
- [ ] Mmul_1
- [ ] IRGSP-1.0
- [ ] CHIMP2.1.4
- [ ] Rnor_6.0
- [ ] R64-1-1
- [ ] EF2
- [ ] Sbi1
- [ ] Sscrofa10.2
- [ ] AGPv3
- [ ] hg38
- [ ] hg19
- [ ] mm10
- [ ] bosTau8
- [ ] ce10
- [ ] canFam3
- [ ] danRer10
- [ ] dm6
- [ ] equCab2
- [ ] galGal4
- [ ] panTro4
- [ ] rn6
- [ ] sacCer3
- [ ] susScr3
Detailed version with asset types
- [ ] GRCh37
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] macs_gsize
- [ ] blacklist
- [ ] GRCh38
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] macs_gsize
- [ ] blacklist
- [ ] GRCm38
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] macs_gsize
- [ ] blacklist
- [ ] TAIR10
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] EB2
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] UMD3.1
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] WBcel235
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] macs_gsize
- [ ] CanFam3.1
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] GRCz10
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] BDGP6
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] macs_gsize
- [ ] EquCab2
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] EB1
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] Galgal4
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] Gm01
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] Mmul_1
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] IRGSP-1.0
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] CHIMP2.1.4
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] Rnor_6.0
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] R64-1-1
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] macs_gsize
- [ ] EF2
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] macs_gsize
- [ ] Sbi1
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] Sscrofa10.2
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] AGPv3
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] hg38
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] macs_gsize
- [ ] blacklist
- [ ] hg19
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] macs_gsize
- [ ] blacklist
- [ ] mm10
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] macs_gsize
- [ ] blacklist
- [ ] bosTau8
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] ce10
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] macs_gsize
- [ ] canFam3
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] danRer10
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] macs_gsize
- [ ] dm6
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] macs_gsize
- [ ] equCab2
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] galGal4
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] panTro4
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] rn6
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] mito_name
- [ ] sacCer3
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] readme
- [ ] mito_name
- [ ] macs_gsize
- [ ] susScr3
- [ ] fasta
- [ ] bwa
- [ ] bowtie2
- [ ] star
- [ ] bismark
- [ ] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
From @nsheff:
For updating: We drive the server instances off a git repository, here: https://github.com/refgenie/refgenomes.databio.org
RIght now, it's semi-automated. Eventually we want it to be that you just update the repo with a PR, and when merged it will deploy automatically. For now, though, we have to build the things manually, but it's all scripted from that repository. The asses are all just annotated with a PEP. So, to add a genome you'd add it to this CSV file: https://github.com/refgenie/refgenomes.databio.org/blob/master/asset_pep/genome_descriptions.csv
Then you'd add whatever inputs are required to build the assets to this file: https://github.com/refgenie/refgenomes.databio.org/blob/master/asset_pep/recipe_inputs.csv
See also:
- https://github.com/refgenie/refgenomes.databio.org#readme
- http://refgenie.databio.org/en/latest/build/
- http://refgenie.databio.org/en/latest/available_assets/
Checking which assets currently available in iGenomes have corresponding recipes available in refgenie. https://github.com/refgenie/refgenie/blob/master/refgenie/asset_build_packages.py
- [x] fasta
- [x] bwa
- [x] bowtie2
- [x] star
- [x] bismark
- [x] gtf
- [ ] bed12
- [ ] readme
- [ ] mito_name
- [ ] macs_gsize
- [x] blacklist
readme
, mito_name
and macs_gsize
are all specific assets that would have to be added manually, as long as that is allowed by refgenie. For bed12
we should be able to write a build recipe and add it to the asset_build_packages.py
@KevinMenden, I can help put together the refgenie asset recipes for the missing asset types. Where can I find commands used to create these?
Needs more data than that for Sarek (cf: https://github.com/nf-core/sarek/blob/05194ed421d7fef6b7cef05fe267a95d8ceb4d6c/conf/igenomes.config#L36-L57)
TODO:
Still to add:
- [ ] bed12 - https://github.com/refgenie/recipes/pull/1
These can be added as attributes to eg. the Fast file rather than dedicated uploads:
- [ ] mito_name
- [ ] macs_gsize https://github.com/refgenie/recipes/pull/2
Readme can probably be skipped, as RefGenie hopefully already has enough provenance for assets.