signac-flow icon indicating copy to clipboard operation
signac-flow copied to clipboard

@with_jobs does not work with @cmd in environments that use jsrun

Open klywang opened this issue 4 years ago • 1 comments

Description

When a job operation is run with @with_jobs and @cmd in an environment which uses jsrun to run jobs on the compute node (ie. Summit), the job will fail.

To reproduce

The following examples will fail:

@Project.operation
@flow.with_job
@flow.cmd
# ... pre and post conditions ...
def foo(job):
    return ('trap "some commands --args" EXIT')
@Project.operation
@flow.cmd
# ... pre and post conditions ...
def foo(job):
    return ('trap "cd {}; some commands --args" EXIT'.format(job.ws))

The following will run:

@Project.operation
@flow.cmd
# ... pre and post conditions ...
def gen_pqr(job):
    return ("cd {}; some commands --args".format(job.ws))

Error output

bash-4.2$ jsrun -n1 python flowprojects/project.py run -o gen_pqr
[h50n13:02512] PMIX ERROR: INVALID-NAMESPACE in file dstore_base.c at line 1739
Error (No such file or directory) executing process: trap
Using environment configuration: SummitEnvironment
ERROR: Encountered error during program execution: 'Command 'jsrun -n 1 -a 1 -c 1 -g 0  -d packed -b rs  trap "cd /path/to/job/ws/; some commands --args" EXIT' returned non-zero exit status 210.'

System configuration

Please complete the following information:

  • Operating System [e.g. macOS]: Red Hat Enterprise Linux (RHEL) version 7.6
  • Version of Python [e.g. 3.7]: 3.7.0
  • Version of signac [e.g. 1.0]: 1.6.0
  • Version of signac-flow: 0.12.0

klywang avatar Apr 12 '21 17:04 klywang

If I understand the issue correctly, addressing #73 should help with this bug.

We might want this: https://github.com/glotzerlab/signac-flow/blob/8f4821bbea3decc62fd87d3152c11881c728d953/flow/operations.py#L99 to be separate from what we submit. For example, rather than submitting jsrun -n 1 -a 1 -c 1 -g 0 -d packed -b rs trap "cd /path/to/job/ws/; some commands --args" EXIT', signac flow would submit trap "cd {job.ws}; jsrun -n 1 -a 1 -c 1 -g 0 -d packed -b rs some commands --args

klywang avatar Apr 13 '21 18:04 klywang