dsub
dsub copied to clipboard
No stderr logging when docker image not found
If I run:
dsub \
--provider local \
--project ${PROJECT_ID} \
--regions ${REGION} \
--logging dsub-debug.log \
--image ubuntu:20.04x \
--output OUT=gs://${BUCKET_NAME}/output/out.txt \
--command 'echo "Hello World" > "${OUT}"' \
--wait
I get the following written to dsub-debug.log
:
Unable to find image 'ubuntu:20.04x' locally
docker: Error response from daemon: manifest for ubuntu:20.04x not found: manifest unknown: manifest unknown.
See 'docker run --help'.
However, if I run the job on google-batch
instead of local
, the ${jobid}.stderr
file set via the --logging
parameter is empty.
The lack of an error messages in the *.stderr
logging files created by google-batch
makes it very hard to troubleshoot failed jobs.
I'm using dsub 0.4.9
Hi @nick-youngblut
~~Thanks for reporting. There was an update in the Batch API that changed the way log files were written to disk. We're in the process of making the dsub changes and pushing a new release. In the meantime, the same command should work with the google-cls-v2
provider.~~
[EDIT]: I misunderstood the issue. See https://github.com/DataBiosphere/dsub/issues/271#issuecomment-1726425079
Thanks @wnojopra for the quick feedback! If I use google-cls-v2
, then no logging files are generated at gs://${BUCKET_NAME}/logging/
:
dsub \
--provider google-cls-v2 \
--project ${PROJECT_ID} \
--regions ${REGION} \
--logging gs://${BUCKET_NAME}/logging/ \
--image ubuntu:20.04x \
--output OUT=gs://${BUCKET_NAME}/output/out.txt \
--command 'echo "Hello World" > "${OUT}"' \
--wait
It appears that if the --provider google-cls-v2
job fails (at least for a non-existent docker image), no --logging
files are created. However, if the --provider google-cls-v2
job succeeds, the --logging
files are created.
Thanks @nick-youngblut . I misunderstood the issue. The confusion comes from how the API pulls the docker image. From https://cloud.google.com/life-sciences/docs/reference/rest/v2beta/projects.locations.pipelines/run#Action in the imageUri
section it mentions "Note that all images referenced by actions in the pipeline are pulled before the first action runs."
Logging is done in one of the actions of a dsub job. It starts relatively early, but as noted the API will attempt to pull the image before the logging action starts. This is why you have no --logging files.
I do notice that dsub itself outputs a helpful error message in this case:
['Error in echo--willyn--230919-201554-73 - code 9: Execution failed: generic::failed_precondition: pulling image: docker pull: running ["docker" "pull" "ubuntu:20.04x"]: exit status 1 (standard error: "Error response from daemon: manifest for ubuntu:20.04x not found: manifest unknown: manifest unknown\\n")']
JobExecutionError: One or more jobs finished with status FAILURE or CANCELED during wait.
So in this case, the error couldn't make it to the logs, but it is output by dsub itself.
Thanks @wnojopra for all of the information!
as noted the API will attempt to pull the image before the logging action starts
Would it be possible to continue the dsub job (try-except
) to include the logging, regardless of whether the image pull was successful? Inconsistent logging behavior can be harder to troubleshoot (e.g., "in this case, check the logs... but in this other case there are no logs, so you have to check...").
I do notice that dsub itself outputs a helpful error message in this case:
Thanks for pointing that out! Is there a way to get a "pretty" print of the logs directly in dsub
? Each line of the logs is so long that reading them (mentally parsing the logs) can be a challenge.