peppy
peppy copied to clipboard
Project config file keyword suggestions
This idea came up when I was writing a configuration file that uses the subprojects section. For each of my subprojects, I was defining an alternate output_dir and sample_annotation, but I was nesting these directly under the subproject name itself rather than within a metadata subsection.
This led to what seemed like a failure by AttributeDict to substitute during parse_config_file the subproject-specific values for the general project ones that had been redefined. While subprojects may not be a heavily used feature, I could see this being a braces/grimaces at double negative not-infrequent error. It's not a big deal if a user is being careful and first using dry-run, but if the submission was actually done and caused way more samples than had been intended to be run to be submitted, that could cost a lot of unintended compute time/$. Either way, the user would need to be able to figure out what was wrong with the config file, which may not be entirely intuitve.
I think we've discussed keeping the config section name definition framework as flexible as possible. I definitely agree, but I think that there could be some value in, say, using knowledge of keywords like output_dir and sample_annotation to suggest proper placement (i.e., some sort of warning if they're not present but not placed within metadata). The keywords that come to mind are the common metadata ones...output_dir, sample_annotation, results_subdir, submission_subdir, pipeline_interfaces.
to take this a step further, we may just want at some point to implement a config file parser/checker, that reports on the health of your config file. It could do a bunch of stuff like this to suggest places that you could improve. this seems like a good thing to thing about in the longer term when PEP becomes more widespread
Cool, I like the sound of that.
- looper config was moved out from pep config file. So this issue is partially outdated.
parser/checker of config file- Do we still want to implement it? How should it look like? Should we have generic config schema?