signac-flow
signac-flow copied to clipboard
@with_jobs does not work with @cmd in environments that use jsrun
Description
When a job operation is run with @with_jobs and @cmd in an environment which uses jsrun to run jobs on the compute node (ie. Summit), the job will fail.
To reproduce
The following examples will fail:
@Project.operation
@flow.with_job
@flow.cmd
# ... pre and post conditions ...
def foo(job):
return ('trap "some commands --args" EXIT')
@Project.operation
@flow.cmd
# ... pre and post conditions ...
def foo(job):
return ('trap "cd {}; some commands --args" EXIT'.format(job.ws))
The following will run:
@Project.operation
@flow.cmd
# ... pre and post conditions ...
def gen_pqr(job):
return ("cd {}; some commands --args".format(job.ws))
Error output
bash-4.2$ jsrun -n1 python flowprojects/project.py run -o gen_pqr
[h50n13:02512] PMIX ERROR: INVALID-NAMESPACE in file dstore_base.c at line 1739
Error (No such file or directory) executing process: trap
Using environment configuration: SummitEnvironment
ERROR: Encountered error during program execution: 'Command 'jsrun -n 1 -a 1 -c 1 -g 0 -d packed -b rs trap "cd /path/to/job/ws/; some commands --args" EXIT' returned non-zero exit status 210.'
System configuration
Please complete the following information:
- Operating System [e.g. macOS]: Red Hat Enterprise Linux (RHEL) version 7.6
- Version of Python [e.g. 3.7]: 3.7.0
- Version of signac [e.g. 1.0]: 1.6.0
- Version of signac-flow: 0.12.0
If I understand the issue correctly, addressing #73 should help with this bug.
We might want this:
https://github.com/glotzerlab/signac-flow/blob/8f4821bbea3decc62fd87d3152c11881c728d953/flow/operations.py#L99
to be separate from what we submit. For example, rather than submitting jsrun -n 1 -a 1 -c 1 -g 0 -d packed -b rs trap "cd /path/to/job/ws/; some commands --args" EXIT', signac flow would submit trap "cd {job.ws}; jsrun -n 1 -a 1 -c 1 -g 0 -d packed -b rs some commands --args