cylc-flow
cylc-flow copied to clipboard
job.sh improvement for creating share and work directories?
Problem
In Cylc (both 7 and 8) job.sh scripts, the mkdir
commands for the share
and work
directories (I'm not sure where share/cycle
is created) assume the directory is not a symbolic link.
# Create share and work directories
mkdir -p "${CYLC_WORKFLOW_SHARE_DIR}" || true
mkdir -p "$(dirname "${CYLC_TASK_WORK_DIR}")" || true
mkdir -p "${CYLC_TASK_WORK_DIR}"
Not so hypothetically, assume there are two lustre disks on the HPC, a production one and a failover disk. Like below.
/g/sc/production_disk -> /g/sc/disk1
/g/sc/failover_production_disk -> /g/sc/disk2
Let's say CYLC_TASK_WORK_DIR -> /g/sc/production_disk/cylc-run/my_workflow/work
.
Then, let's say a disk failover happens, so now
/g/sc/production_disk -> /g/sc/disk2
/g/sc/failover_production_disk -> /g/sc/disk2 # disk1 is offline
Cylc job.sh will fail to create these directories, and things will assume they exist. This can lead to failures if the directories don't exist.
Proposed Solution
Be a bit more defensive and do mkdir
on where that disk is pointing at.
# Create share and work directories
mkdir -p "$(readlink -m "${CYLC_WORKFLOW_SHARE_DIR}")" || true
mkdir -p "$(readlink -m "$(dirname "${CYLC_TASK_WORK_DIR}")")" || true
mkdir -p "$(readlink -m "${CYLC_TASK_WORK_DIR})"
Similar for any directory creation for directories that could be symbolic links as defined by the global.cylc[install][symlink dirs]
area.
ps. whilst here, I don't quite get the need for two mkdir here
mkdir -p "$(dirname "${CYLC_TASK_WORK_DIR}")" || true
mkdir -p "${CYLC_TASK_WORK_DIR}"
$ CYLC_TASK_WORK_DIR=/home/USER/cylc-run/foo/work/20110511T1800Z/t1 # (from docs)
$ dirname "$CYLC_TASK_WORK_DIR"
/home/USER/cylc-run/foo/work/20110511T1800Z
Based on the options, neither of those two should be symbolic links, so you should just be able to have the latter one and ditch the dirname "$CYLC_TASK_WORK_DIR"
I think.
Thinking about this, I had a suggestion. Any suggestions in the original were based on knowledge of Cylc7 only. A way to do this nicely in Cylc8 would be (making some assumptions).
- Pass through into the job script for each task, similar to a global-init-script in the global.cylc file, any resolved paths for a host. e.g. CYLC_WORKFLOW_SHARE_DIR_RESOLVED
- Change mkdir to something like
mkdir -p "${CYLC_WORKFLOW_SHARE_DIR:-"$CYLC_WORKFLOW_SHARE_DIR"}" || true
, similarly for the work directory one
This way it doesn't have the extra i/o of readlink, and still is just one mkdir, but instead of going via symlinks, its the actual path which is known as its in the global.cylc file now instead of managed via an external tool, rose
.
Does that make sense as an idea?
I think I missed this one when original posted, sorry.
ps. whilst here, I don't quite get the need for two mkdir here
Yeah, same.
Does that make sense as an idea?
We'll need to think this through! Agreed needs a solution though... one for next project meeting, mid-Feb, if not before.
Thanks. I don't really like the idea of the readlink
approach, although I believe it would work. If we can make use of the information contained in global.cylc
instead somehow, that would be the best option I can think of without dwelling on it too extensively (or knowing all the Cylc ins and outs)..