elvers icon indicating copy to clipboard operation
elvers copied to clipboard

meta issue for high-level error checking

Open ctb opened this issue 6 years ago • 2 comments

see https://github.com/dib-lab/eelpond/issues/103#issuecomment-471185817 for initial motivation --

I think we need a modular way to do high-level correctness checking.

e.g.,

  • if quantify is run, either some assembly thingy needs to be specified OR a reference transcriptome needs to be provided
  • if gene_trans_map is true, the gene trans map file should exist
  • if the reads aren't gzipped, we should flag that somewhere ref https://github.com/dib-lab/eelpond/issues/30

I don't think run_eelpond should have this error checking in it directly, tho! Maybe we could put in something that when a particular rule file is included, it has some high level checks that it runs, or maybe that should be connected in some way to the higher level workflows mentioned in pipeline_defaults.yaml?

ctb avatar Mar 09 '19 15:03 ctb

I think in the eelpond_params section of the params.yml file, we can add a require parameter that describes the required rules. Include utility rules like get_data, etc. Need to have an "or" option in place though, for situations wither either assemblyinput or assembly are required.

Not sure yet how to check if something has already been run (e.g. trimmomatic). Maybe don't check, but add a help section to the eelpond_params that has a brief description of the workflow & its required components. Would be helpful to run elvers examples/nema.yml assembly -h to return this help description to stdout.

bluegenes avatar Mar 14 '19 18:03 bluegenes

the require idea outlined above would involve updating requirements with exact rules that exist (e.g. right now salmon requires either get_reference or trinity, but in the future, other assemblers may work).

To get around this, maybe we instead create input/output categories that go in each params.yml files. When running a workflow, we check that all inputs are satisfied, and if not, print a list of all rules or utilities that provide that output. For example, if we need 'transcriptome", we have two rules that produce that, get_reference and trinity, and we can print a helpful message to suggest the user provide either rule.

something like this?

salmon:
  inputs:
      read:
        - raw
        - trimmed
      reference:
        - transcriptome
  outputs:
    read:
      - counts
deseq2:
  inputs:
      read:
        - counts
      reference:
        - transcriptome
  outputs:
    base:
      - diffexp

bluegenes avatar Mar 17 '19 22:03 bluegenes