meta issue for high-level error checking
see https://github.com/dib-lab/eelpond/issues/103#issuecomment-471185817 for initial motivation --
I think we need a modular way to do high-level correctness checking.
e.g.,
- if
quantifyis run, either some assembly thingy needs to be specified OR a reference transcriptome needs to be provided - if
gene_trans_mapis true, the gene trans map file should exist - if the reads aren't gzipped, we should flag that somewhere ref https://github.com/dib-lab/eelpond/issues/30
I don't think run_eelpond should have this error checking in it directly, tho! Maybe we could put in something that when a particular rule file is included, it has some high level checks that it runs, or maybe that should be connected in some way to the higher level workflows mentioned in pipeline_defaults.yaml?
I think in the eelpond_params section of the params.yml file, we can add a require parameter that describes the required rules. Include utility rules like get_data, etc. Need to have an "or" option in place though, for situations wither either assemblyinput or assembly are required.
Not sure yet how to check if something has already been run (e.g. trimmomatic). Maybe don't check, but add a help section to the eelpond_params that has a brief description of the workflow & its required components. Would be helpful to run elvers examples/nema.yml assembly -h to return this help description to stdout.
the require idea outlined above would involve updating requirements with exact rules that exist (e.g. right now salmon requires either get_reference or trinity, but in the future, other assemblers may work).
To get around this, maybe we instead create input/output categories that go in each params.yml files. When running a workflow, we check that all inputs are satisfied, and if not, print a list of all rules or utilities that provide that output. For example, if we need 'transcriptome", we have two rules that produce that, get_reference and trinity, and we can print a helpful message to suggest the user provide either rule.
something like this?
salmon:
inputs:
read:
- raw
- trimmed
reference:
- transcriptome
outputs:
read:
- counts
deseq2:
inputs:
read:
- counts
reference:
- transcriptome
outputs:
base:
- diffexp