usegalaxy-playbook
usegalaxy-playbook copied to clipboard
Ephemeris run-data-managers for Main/Test
Ephemeris: Specifically, functions for batch data fetch/index with run-data-managers
- https://github.com/galaxyproject/ephemeris
- https://ephemeris.readthedocs.io/en/latest/commands/run-data-managers.html
@nate This is how we should install genomes/indexes going forward, agree?
@jmchilton @bgruening Ready yet or wait a bit for production server use?
This will require some tuning for main/test due to the mix of legacy & DM created data so we wouldn't want to tune main-specific methods around it until mostly stable. Can chat offline about this if not here.
I'd like to see methods for these functions below. Could be implemented already (didn't find) or on a todo list (where?) or should be in a "for usegalaxy.org" branch. Please correct/point me to functions if existing. Will create tickets in the primary repo and link back. Just saw the latest updates (dups avoided - great!!).
- Sanity check existing loc content/format: individual versus each other [ticket link]
- Sanity check locs versus actual data files on disc [ticket link]
- Make targeted corrections [ticket link]
- Query "available" genomes from sources (UCSC main, NCBI, Ensembl) [ticket link]
- Summarize indexes. Flat file with dbkey, long/short labels, source (server), source (datafile URL, size on disc, runtime/timestamp per DM job (when available), server [ticket link]
- Synch/push/pull indexes to/from public data server (e.g. datacache.g2.bx.psu.edu) [ticket link]
Some of the above is covered in older linked tickets listed the master genome ticket under /galaxy here: https://github.com/galaxyproject/galaxy/issues/1470