cromwell icon indicating copy to clipboard operation
cromwell copied to clipboard

Cromwell GCP error - The referenced network resource cannot be found

Open hsimran13 opened this issue 3 years ago • 9 comments

Hello,

I am new to cromwell and trying to run a test workflow on GPC. I am using the PAPIv2 backend and here is my config:

$ cat genomics.conf | grep -v '#' | sed '/^$/d'
include required(classpath("application"))
google {
    application-name = "cromwell"
    auths = [
        {
            name = "application-default"
            scheme = "application_default"
        }
    ]
}
engine {
    filesystems {
        gcs {
            auth = "application-default"
            project = "xxxxx"
        }
    }
}
backend {
    default = PAPIv2
    providers {
        PAPIv2 {
            actor-factory = "cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory"
            config {
                project = "xxxxx"
                root = "gs://xxxx/cromwell_execution"
                virtual-private-cloud {
                    network-label-key = "xxx"
                    subnetwork-label-key = "xxx"
                    auth = "application-default"
                }
                name-for-call-caching-purposes: PAPI
                slow-job-warning-time: "24 hours"
                genomics-api-queries-per-100-seconds = 1000
                maximum-polling-interval = 600
                request-workers = 3
                genomics {
                    auth = "application-default"
                    endpoint-url = "https://genomics.googleapis.com/"
                    location = "us-west1"
                    restrict-metadata-access = false
                    localization-attempts = 3
                    parallel-composite-upload-threshold="150M"
                }
                filesystems {
                    gcs {
                        auth = "application-default"
                        project = "xxxx"
                        caching {
                            duplication-strategy = "copy"
                        }
                    }
                    http { }
                }
                default-runtime-attributes {
                    cpu: 1
                    failOnStderr: false
                    continueOnReturnCode: 0
                    memory: "2048 MB"
                    bootDiskSizeGb: 10
                    disks: "local-disk 10 SSD"
                    noAddress: false
                    preemptible: 0
                    zones: ["us-west1-a", "us-west1-b"]
                }
                include "papi_v2_reference_image_manifest.conf"
            }
        }
    }
}

When I run with the above config using:

java -Dconfig.file=genomics.conf -jar cromwell-66.jar run cumulus.wdl -i cumulus_inputs.json

I am getting the following error message:

[2021-08-24 22:05:33,60] [info] WorkflowManagerActor: Workflow 6cc303b4-295d-49fa-a996-b5cf7ec9beea failed (during ExecutingWorkflowState): java.lang.Exception: Task cumulus.cluster:NA:1 failed. The job was stopped before the command finished. PAPI error code 3. Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found.

I have tried passing the vpc and subnet id using the following config:

              virtual-private-cloud {
                    network-label-key = "xxx"
                    subnetwork-label-key = "xxx"
                    auth = "application-default"
                }

The above values are my actual vpc and subnet id/name. However, it is still giving me that error message. Is there something I am missing from a configuration perspective. Any help would be greatly appreciated. Our VPC network's are not created in auto mode and that is not something we have control over unfortunately.

Thanks, -Simran

hsimran13 avatar Aug 24 '21 22:08 hsimran13

I have created a new lablels file and using that to pass the VPC/subnet info but still get the same error:

$ grep -i label genomics.conf
                    network-label-key = "my-private-network"
                    subnetwork-label-key = "my-private-subnetwork"
$ cat labels.json
{
  "my-private-network":  "xxxx",
  "my-private-subnetwork": "xxxx"
}

and updated my cromwell command to the following:

java -Dconfig.file=genomics.conf -jar cromwell-66.jar run cumulus.wdl -i cumulus_inputs.json -l labels.json

I still get the same error though. Is this even possible or am I missing something?

Thanks.

hsimran13 avatar Aug 24 '21 23:08 hsimran13

In Cromwell versions 67 and earlier virtual-private-cloud configuration exclusively specifies Google project label keys, not literal values. The actual values are specified in labels on the Google project. For example with a VPC config like:

              virtual-private-cloud {
                    network-label-key = "my-network-label-key"
                    subnetwork-label-key = "my-subnetwork-label-key"
                    auth = "application-default"
                }

As seen in the labels page in GCP console, there should be project labels with key/values of my-network-label-key/my-private-network and my-subnetwork-label-key/my-private-subnetwork.

mcovarr avatar Aug 25 '21 00:08 mcovarr

Thanks @mcovarr for your response. I realized that after my initial post and created a labels.json with the following contents:

{
  "google_labels": {
    "my-private-network":  "xxx",
    "my-private-subnetwork": "yyy"
  }
}

where xxx and yyy are my actual vpc network and subnet names in GCP. Then I added the "-l labels.json" option to the cromwell run command but that still gives me the same error. Am I missing something here? Apologies but this is what I am understanding from the posts/docs that needs to happen but won't work when I try it. Am I supposed to create some label in the actual GCP account as well?

Thanks.

hsimran13 avatar Aug 25 '21 00:08 hsimran13

Ahh I think I see what you mean. I don't need the "-l labels.json" but need to create an actual Label in the GCP account that has the following key/value:

my-private-network: xxx my-private-subnetwork: yyy

I don't have access to create the labels but will have someone do this and try again. Let me know if I am still missing something.

Thanks.

hsimran13 avatar Aug 25 '21 00:08 hsimran13

@mcovarr which resource does the label need to be created in? Your link took me to the IAM & Admin Label's section for GCP. Is that where I should create this label or on the GCP instance resource that I am running this command from?

Thanks.

hsimran13 avatar Aug 25 '21 00:08 hsimran13

The labels should be created on the GCP project (not on the GCE instance), so the link should be going to the correct location.

mcovarr avatar Aug 25 '21 00:08 mcovarr

Thanks @mcovarr, that seems to have worked. The job has gone into a Running state. Really appreciate your quick response and assistance.

-Simran

hsimran13 avatar Aug 25 '21 01:08 hsimran13

@mcovarr, looks like I got past the initial issue but now getting the following error:

[2021-08-25 01:11:31,83] [info] WorkflowManagerActor: Workflow 2a7b8039-a555-4f58-86b0-dc4a6fa21dff failed (during ExecutingWorkflowState): java.lang.Exception: Task cumulus.cluster:NA:1 failed. The job was stopped before the command finished. PAPI error code 9. generic::failed_precondition: Constraint constraints/compute.trustedImageProjects violated for project gred-cumulus-sb-01-991a49c4. Use of images from project cloud-lifesciences is prohibited.

Looks like our GCP accounts don't allow non standard images. Which image is this workflow trying to use? Is there a way to provide our own image to this pipeline instead?

Thanks

hsimran13 avatar Aug 25 '21 01:08 hsimran13

I am not familiar with that error message. From a bit of Googling it looks like this may be relevant. Assuming cloud-lifesciences is Google's project hosting the image that Cloud Life Sciences is trying to use to spin up the worker VM, you may need to add projects/cloud-lifesciences to your organization's trusted image projects.

mcovarr avatar Aug 25 '21 12:08 mcovarr