dee2 icon indicating copy to clipboard operation
dee2 copied to clipboard

Time to include some extra species

Open markziemann opened this issue 5 years ago • 6 comments

6th FEB 2020 no. SRX
Arabidopsis thaliana 30890
Zea mays 19753
Oryza sativa 9737
Triticum aestivum 6924
Solanum lycopersicum 6444
Sorghum bicolor 4646
Glycine max 3889
Populus trichocarpa 3485
Vitis vinifera 3258
Panicum virgatum 2338
Hordeum vulgare 2284
Solanum tuberosum 1851
Brachypodium distachyon 1814
18th FEB 2020 no. SRX
Schizosaccharomyces pombe 4718
Plasmodium falciparum 4298

markziemann avatar Feb 23 '20 11:02 markziemann

Macaca mulatta Bos Taurus Sus scrofa Gallus gallus Ovis aries

markziemann avatar Nov 30 '20 01:11 markziemann

Thank you for providing this DEE2 database. I have been using it for quite a while. Some short questions:

  1. Is this idea of adding species ongoing?
  2. In case that this is ongoing, any specific genome assembly/annotation versions been used?

My colleagues would be interested in rice and maize. Just tested the singularity solution and it seems that we can run the pipeline by ourselves. In case that some genome assembly/annotation versions for rice and maize have been adopted by DEE2, we would like to consider following them and maybe share the computation results.

wdlingit avatar Jun 28 '24 02:06 wdlingit

Hi @wdlingit, we have been unsuccessfully seeking funding to support the expansion of DEE2 in particular with the backlog of mouse and human studies and the possibility of updating to the latest reference genome build. That said, I think we can work together to get rice and maize included. I will do the necessary work to modify the pipeline to include rice and maize data and then update the web server side of things. If you could do the data processing at your institution, it would help expedite things along. I'm not sure about an exact timeline, but I might have things ready to start data processing by end of August.

markziemann avatar Jul 04 '24 03:07 markziemann

Thank you for the reply. We collected SRR accessions with NCBI Taxonomy ID 39947 plus some minor restriction. Our current SRR list to be processed is about 7K SRRs. This is smaller than what you listed a few years ago. I think this is reasonable because Tax ID 39947 is for Oryza sativa Japonica Group, a subspecies(?) of rice. Oryza sativa Japonica Group is also available in ensembl plants ( https://plants.ensembl.org/Oryza_sativa/Info/Index ) We just started (2 hours ago) a test run of 1000 SRRs and things seem OK to me. In order to make sure things are coordinated, I listed info we applied in the volunteer_pipeline.sh script:

elif [ $ORG == "osativa" ] ; then
  GTFURL="ftp://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/gtf/oryza_sativa/Oryza_sativa.IRGSP-1.0.59.gtf.gz"
  GDNAURL="ftp://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/oryza_sativa/dna/Oryza_sativa.IRGSP-1.0.dna_sm.toplevel.fa.gz"
  CDNAURL="ftp://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/oryza_sativa/cdna/Oryza_sativa.IRGSP-1.0.cdna.all.fa.gz"
  BT2_MD5="05eb69ae1d8b8b0d2cc06e890bf55dc6"
  KAL_MD5="6f618eda89e9b057c99d4d7580c5858d"
  STAR_MD5="b374bef1756a1ea105c968d68c71127e"

wdlingit avatar Jul 04 '24 07:07 wdlingit