usegalaxy-playbook icon indicating copy to clipboard operation
usegalaxy-playbook copied to clipboard

Add PLAZA (plant) genomes to test, main, and cvmfs

Open jennaj opened this issue 6 years ago • 1 comments

As discussed at prior GCC's and other places, getting plant genomes into Galaxy in an organized and consistent way across usegalaxy.* servers will address requests from users.

The how-to for making the matching annotation data available is part of this. Data library? GUI cues about the data being available? Formatting custom genomes and annotation are both difficult for users, especially matching up the chromosome identifiers.

This comes up quite often for plants and other genomes (example: bacteria) sourced from NCBI that are not natively indexed and therefore used as custom genomes. Plus, natively indexed genomes currently only have the actual genome/tool indexes available, no matching annotation -- with the exception of very few tools that cover a limited number genomes.

Much data is now ready per @frederikcoppens:

We made some progress on plant genomes.

In short, we have 2 repos: https://github.com/frederikcoppens/galaxy_data_management https://github.com/ieguinoa/genomes_to_galaxy

We have made all genomes in PLAZA (https://bioinformatics.psb.ugent.be/plaza/ ) available, as well as the matching GFF files. The first repo contains the scripts to get these with data managers and ephemeris installed in a Galaxy instance. We also use data managers for GFF. This has been used to install the genomes on usegalaxy.be

This covers most high quality genomes, but to also cover all the rest we have the second repo. It uses the same procedure for Galaxy, but includes some more preprocessing to get all the files we need in the right way. This is a work in progress.

cc @bgruening re coordinating getting this data into CVMFS

jennaj avatar Jan 21 '19 21:01 jennaj