pypiper
pypiper copied to clipboard
Checking for pipeline command requirements to fail early
This issue has come up repeatedly in several projects, this issue is to aggregate these thoughts in one place. It would be nice to have a way for a pipeline to do a gut-check and make sure all the commands it requires are at least executable. It could then fail or warn early if something is amiss.
This belongs in pypiper, I think.
These are related issues:
https://github.com/databio/pepatac/issues/68
https://github.com/pepkit/peppy/issues/221
https://github.com/databio/pararead/pull/29
https://github.com/pepkit/looper/issues/53
https://github.com/pepkit/peppy/issues/23
Hey @jpsmith5 anything to add here? :smile:
Whoa it's like the nexus of the universe
A couple challenges I've dealt with so far and worth keeping track of are listed here.
I'm also not sure how much of this would be modifiable based on the ngstk.check_command usage (which itself is using the system command).
- example 1: a required tool is a
jarfile, which requires modifying the command tojava -jar <jar_file>to call withcommand - example 2: a tool installed in python site packages required
$PYTHONPATHto be explicitly set beforecommandcould properly identify it as callable. e.g.MACS2 - example 3: an environment variable points to a tool to be called, but if the variable is never set
commandstill returns 0, even though the tool is NOT callable. e.g. ${PICARD}
Current workaround is to check if the expected command contains a '.jar' string and modify with the java -jar prefix.
Check for presence of '$' in the command and assume it is uncallable, report it and fail.
https://github.com/pepkit/looper/issues/195
the 'is_command_callable" function in the latest ubiquerg should probably be basically able to solve this...
we might want to just re-expose it via pypiper to keep things simple for the pipeline author. maybe wrap it to simplify even further so it can take a list of tools, handle jarfiles, or something