ephemeris icon indicating copy to clipboard operation
ephemeris copied to clipboard

Preprocessing Layer for Run Data Managers

Open jmchilton opened this issue 3 years ago • 1 comments

Simon start work on a higher-level genome processing outside of Ephemeris with...

  • This script: https://github.com/galaxyproject/idc/blob/769e8bd1423a32b5b19973fda981686552eeb240/scripts/make_fetch.py
  • Instantiated here: https://github.com/galaxyproject/idc/blob/769e8bd1423a32b5b19973fda981686552eeb240/run_builder.sh#L69

This is a great idea and we should formalize it and make it more robust and broadly useful by moving this functionality into Ephemeris and right into the run-data-managers endpoint.

MVP:

  • Establish Pydantic models (or maybe pykwalify but probably not?) for the a low-level run data managers layer - that is the current inputs to run-data-managers.
  • Write Pydantic models for syntactic sugar that covers:
    • If genomes key is available, read them and convert to invocations of the data_manager_fetch_genome_dbkeys_all_fasta tool as covered by make_fetch.py - assume latest version of data_manager_fetch_genome_dbkeys_all_fasta.
    • Prepend those invocations to the list of managers to run.
    • Write those all back to the lower level YAML description and validate.
  • In run-data-managers run the preprocessor before executing these.

Follow Ups Enhancements:

  • After #188 is implemented, run the preprocessor before looking for tool ids.
  • Pick an important data manager that doesn't start from genomes/dbkeys (Kraken I suppose - or is gemini another thing?) and generalize the initial sources like this.
  • Pick an important data manager that indexes genomes (further along in the "workflow") and define some syntactic sugar to make the invocation of this cleaner from XML (TODO come up with example or drop this bullet point if it doesn't make sense)

jmchilton avatar Dec 02 '22 17:12 jmchilton

Maybe worth mentioning this here: galaxyproject/galaxy#15188

The issue is in Galaxy but I'm not sure how involved a Galaxy-side fix would be, but Ephemeris could work around it fairly easy by querying data tables first.

natefoo avatar Dec 13 '22 22:12 natefoo