cromwell
cromwell copied to clipboard
Code to repro broken reference disks in GCP Batch [WX-1819]
Description
Reference disks currently appear to be broken in the GCP Batch backend. This PR adds a little bit of Centaur infrastructure and a copy/paste/modify of a basic Papi v2 reference disk test to demonstrate the issues.
Currently when I submit the Centaur test added in this PR, the job fails before invoking the user command with an exit code of 125, which appears to be a Docker error. Indeed in Logs Explorer I see this:
docker: Error response from daemon: invalid mode: async, rw.
which apparently has to do with this code which is explicitly trying to specify read-write and async for the reference volumes to be mounted. From the docs, async
does not appear to be an option for non-NFS Docker volumes.
Inspecting the batch job description, I see a command like this:
"printf '%s %s\\n' \"$(date -u '+%Y/%m/%d %H:%M:%S')\" Running\\ user\\ runnable:\\ docker\\ run\\ -v\\ /mnt/disks/cromwell_root:/mnt/disks/cromwell_root\\ -v\\ /mnt/11a4324d4472f639f3fc558b00afeacd:/mnt/11a4324d4472f639f3fc558b00afeacd:async,\\\\\\ rw\\ -v\\ /mnt/d9e025138b28caa42dd4006fc3636661:/mnt/d9e025138b28caa42dd4006fc3636661:async,\\\\\\ rw\\ --entrypoint\\=/bin/bash\\ ubuntu@sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee\\ /mnt/disks/cromwell_root/script"
i.e., explicitly specifying async, rw
. By comparison the working Papiv2 reference disk system explicitly specifies ro
:
"printf '%s %s\\n' \"$(date -u '+%Y/%m/%d %H:%M:%S')\" Running\\ user\\ action:\\ docker\\ run\\ -v\\ /mnt/local-disk:/cromwell_root\\ -v\\ /mnt/d-312601206d5deb55b631d02269f3b3a5:/mnt/11a4324d4472f639f3fc558b00afeacd:ro\\ -v\\ /mnt/d-c74a541aa27f13cfe59c2f998a664729:/mnt/d9e025138b28caa42dd4006fc3636661:ro\\ --entrypoint\\=/bin/bash\\ ubuntu@sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee\\ /cromwell_root/script"
I attempted to modify the GCP Batch backend to pass ro
, but for some reason that ro
does not seem to make it to the Docker command line.
"printf '%s %s\\n' \"$(date -u '+%Y/%m/%d %H:%M:%S')\" Running\\ user\\ runnable:\\ docker\\ run\\ -v\\ /mnt/disks/cromwell_root:/mnt/disks/cromwell_root\\ -v\\ /mnt/11a4324d4472f639f3fc558b00afeacd:/mnt/11a4324d4472f639f3fc558b00afeacd\\ -v\\ /mnt/d9e025138b28caa42dd4006fc3636661:/mnt/d9e025138b28caa42dd4006fc3636661\\ --entrypoint\\=/bin/bash\\ ubuntu@sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee\\ /mnt/disks/cromwell_root/script"
However ro
does seem to be applied to the volume specifications:
"volumes": [
"/mnt/disks/cromwell_root:/mnt/disks/cromwell_root:rw",
"/mnt/11a4324d4472f639f3fc558b00afeacd:/mnt/11a4324d4472f639f3fc558b00afeacd:ro",
"/mnt/d9e025138b28caa42dd4006fc3636661:/mnt/d9e025138b28caa42dd4006fc3636661:ro"
]
The main volume is read-write as expected, and the two volumes corresponding to the reference disks are read-only. However the reference volumes being read-only seems to be an issue for Docker:
docker: Error response from daemon: error while creating mount source path '/mnt/11a4324d4472f639f3fc558b00afeacd': mkdir /mnt/11a4324d4472f639f3fc558b00afeacd: read-only file system."
Release Notes Confirmation
CHANGELOG.md
- [ ] I updated
CHANGELOG.md
in this PR - [x] I assert that this change shouldn't be included in
CHANGELOG.md
because it doesn't impact community users
Terra Release Notes
- [ ] I added a suggested release notes entry in this Jira ticket
- [x] I assert that this change doesn't need Jira release notes because it doesn't impact Terra users