cromwell icon indicating copy to clipboard operation
cromwell copied to clipboard

GCP: Ignore error if ssh-server fails

Open ovesh opened this issue 2 years ago • 2 comments

We are running Cromwell with the Google Genomics (aka Google Pipelines API, aka Google Life Sciences API) backend plugin.

There's a known issue on Google's side (unfortunately no public link) that causes ssh-server to fail to start up (tcp4 0.0.0.0:22: bind: address already in use). This causes the entire workflow to fail.

A change in SSHAccessAction will allow the worker to ignore the error:

setIgnoreExitStatus(true)

ovesh avatar May 26 '22 15:05 ovesh

We have also seen the address already in use error. Are you saying that the error is false and we should ignore it?

If it is a real error, then it seems like we would want to continue seeing it, and have the workaround be turning off the SSH enablement option enable_ssh_access [0].

[0] https://cromwell.readthedocs.io/en/stable/wf_options/Google/

aednichols avatar May 26 '22 16:05 aednichols

it seems like we would want to continue seeing it @aednichols The bug in Life Sci API is that the ssh server is supposed to be disabled on the VM, but in some cases it is not, causing the address already in use problem. Since the ssh server is not disabled, ssh access to the VM is in fact possible. The error then becomes meaningless: the dockerized ssh server is unrelated to the wdl workflow, and users can still ssh to the VM.

ovesh avatar May 27 '22 19:05 ovesh