dee2
dee2 copied to clipboard
Time to include some extra species
6th FEB 2020 | no. SRX |
---|---|
Arabidopsis thaliana | 30890 |
Zea mays | 19753 |
Oryza sativa | 9737 |
Triticum aestivum | 6924 |
Solanum lycopersicum | 6444 |
Sorghum bicolor | 4646 |
Glycine max | 3889 |
Populus trichocarpa | 3485 |
Vitis vinifera | 3258 |
Panicum virgatum | 2338 |
Hordeum vulgare | 2284 |
Solanum tuberosum | 1851 |
Brachypodium distachyon | 1814 |
18th FEB 2020 | no. SRX |
---|---|
Schizosaccharomyces pombe | 4718 |
Plasmodium falciparum | 4298 |
Macaca mulatta Bos Taurus Sus scrofa Gallus gallus Ovis aries
Thank you for providing this DEE2 database. I have been using it for quite a while. Some short questions:
- Is this idea of adding species ongoing?
- In case that this is ongoing, any specific genome assembly/annotation versions been used?
My colleagues would be interested in rice and maize. Just tested the singularity solution and it seems that we can run the pipeline by ourselves. In case that some genome assembly/annotation versions for rice and maize have been adopted by DEE2, we would like to consider following them and maybe share the computation results.
Hi @wdlingit, we have been unsuccessfully seeking funding to support the expansion of DEE2 in particular with the backlog of mouse and human studies and the possibility of updating to the latest reference genome build. That said, I think we can work together to get rice and maize included. I will do the necessary work to modify the pipeline to include rice and maize data and then update the web server side of things. If you could do the data processing at your institution, it would help expedite things along. I'm not sure about an exact timeline, but I might have things ready to start data processing by end of August.
Thank you for the reply. We collected SRR accessions with NCBI Taxonomy ID 39947 plus some minor restriction. Our current SRR list to be processed is about 7K SRRs. This is smaller than what you listed a few years ago. I think this is reasonable because Tax ID 39947 is for Oryza sativa Japonica Group, a subspecies(?) of rice. Oryza sativa Japonica Group is also available in ensembl plants ( https://plants.ensembl.org/Oryza_sativa/Info/Index ) We just started (2 hours ago) a test run of 1000 SRRs and things seem OK to me. In order to make sure things are coordinated, I listed info we applied in the volunteer_pipeline.sh
script:
elif [ $ORG == "osativa" ] ; then
GTFURL="ftp://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/gtf/oryza_sativa/Oryza_sativa.IRGSP-1.0.59.gtf.gz"
GDNAURL="ftp://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/oryza_sativa/dna/Oryza_sativa.IRGSP-1.0.dna_sm.toplevel.fa.gz"
CDNAURL="ftp://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/oryza_sativa/cdna/Oryza_sativa.IRGSP-1.0.cdna.all.fa.gz"
BT2_MD5="05eb69ae1d8b8b0d2cc06e890bf55dc6"
KAL_MD5="6f618eda89e9b057c99d4d7580c5858d"
STAR_MD5="b374bef1756a1ea105c968d68c71127e"