cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

job.sh improvement for creating share and work directories?

Open ColemanTom opened this issue 1 year ago • 3 comments

Problem

In Cylc (both 7 and 8) job.sh scripts, the mkdir commands for the share and work directories (I'm not sure where share/cycle is created) assume the directory is not a symbolic link.

    # Create share and work directories
    mkdir -p "${CYLC_WORKFLOW_SHARE_DIR}" || true
    mkdir -p "$(dirname "${CYLC_TASK_WORK_DIR}")" || true
    mkdir -p "${CYLC_TASK_WORK_DIR}"

Not so hypothetically, assume there are two lustre disks on the HPC, a production one and a failover disk. Like below.

/g/sc/production_disk -> /g/sc/disk1
/g/sc/failover_production_disk -> /g/sc/disk2

Let's say CYLC_TASK_WORK_DIR -> /g/sc/production_disk/cylc-run/my_workflow/work.

Then, let's say a disk failover happens, so now

/g/sc/production_disk -> /g/sc/disk2
/g/sc/failover_production_disk -> /g/sc/disk2  # disk1 is offline

Cylc job.sh will fail to create these directories, and things will assume they exist. This can lead to failures if the directories don't exist.

Proposed Solution

Be a bit more defensive and do mkdir on where that disk is pointing at.

    # Create share and work directories
    mkdir -p "$(readlink -m "${CYLC_WORKFLOW_SHARE_DIR}")" || true
    mkdir -p "$(readlink -m "$(dirname "${CYLC_TASK_WORK_DIR}")")" || true
    mkdir -p "$(readlink -m "${CYLC_TASK_WORK_DIR})"

Similar for any directory creation for directories that could be symbolic links as defined by the global.cylc[install][symlink dirs] area.

ps. whilst here, I don't quite get the need for two mkdir here

    mkdir -p "$(dirname "${CYLC_TASK_WORK_DIR}")" || true
    mkdir -p "${CYLC_TASK_WORK_DIR}"
$ CYLC_TASK_WORK_DIR=/home/USER/cylc-run/foo/work/20110511T1800Z/t1  # (from docs)
$ dirname "$CYLC_TASK_WORK_DIR"
/home/USER/cylc-run/foo/work/20110511T1800Z

Based on the options, neither of those two should be symbolic links, so you should just be able to have the latter one and ditch the dirname "$CYLC_TASK_WORK_DIR" I think.

ColemanTom avatar Jun 02 '23 06:06 ColemanTom

Thinking about this, I had a suggestion. Any suggestions in the original were based on knowledge of Cylc7 only. A way to do this nicely in Cylc8 would be (making some assumptions).

  1. Pass through into the job script for each task, similar to a global-init-script in the global.cylc file, any resolved paths for a host. e.g. CYLC_WORKFLOW_SHARE_DIR_RESOLVED
  2. Change mkdir to something like mkdir -p "${CYLC_WORKFLOW_SHARE_DIR:-"$CYLC_WORKFLOW_SHARE_DIR"}" || true, similarly for the work directory one

This way it doesn't have the extra i/o of readlink, and still is just one mkdir, but instead of going via symlinks, its the actual path which is known as its in the global.cylc file now instead of managed via an external tool, rose.

Does that make sense as an idea?

ColemanTom avatar Jan 30 '24 01:01 ColemanTom

I think I missed this one when original posted, sorry.

ps. whilst here, I don't quite get the need for two mkdir here

Yeah, same.

Does that make sense as an idea?

We'll need to think this through! Agreed needs a solution though... one for next project meeting, mid-Feb, if not before.

hjoliver avatar Jan 30 '24 02:01 hjoliver

Thanks. I don't really like the idea of the readlink approach, although I believe it would work. If we can make use of the information contained in global.cylc instead somehow, that would be the best option I can think of without dwelling on it too extensively (or knowing all the Cylc ins and outs)..

ColemanTom avatar Jan 30 '24 05:01 ColemanTom