nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

feature: add support for label/tags

Open jorgeaguileraseqera opened this issue 2 years ago • 6 comments

Allow a new syntax in the label directive in key-value format

  label k: 'value'
  label b: "${params.user}"

when this syntax is used we store them in a new Map variable so executors can access them and tag the process if present

closes #2845

Signed-off-by: Jorge Aguilera [email protected]

jorgeaguileraseqera avatar May 06 '22 13:05 jorgeaguileraseqera

Didn't we agree to go with label 'name=value' as first iteration"?

pditommaso avatar May 06 '22 13:05 pditommaso

yes, but once started with the implementation I've found this syntax more powerful and wanted to propose it

(I'm still working on the issue so I can include also our first idea about name=value)

jorgeaguileraseqera avatar May 06 '22 13:05 jorgeaguileraseqera

This PR allows the following syntax:

process foo {
      label 'bravo'   //<--- a label as previously
      label 'department=floor 3' 
      label region:'eu-west-1' 
      label region:'eu-west-1' , department:'floor 3'
}

jorgeaguileraseqera avatar May 06 '22 14:05 jorgeaguileraseqera

Ok, I'll check soon 👍

pditommaso avatar May 06 '22 16:05 pditommaso

This might be a different feature, but there could be value in allowing users to specify labels at the top-level like on the executor.

Then have those labels propagate down to the process to be set. Otherwise we'd have to set those labels/params on every single process.

https://github.com/nextflow-io/nextflow/issues/2845

google {
    labels = [ 'user-id': params.user_id, 'project-id': params.project_id ]
    project = 'theprojectid'
    zone = 'europe-west1-b'
}

dougnukem avatar May 20 '22 23:05 dougnukem

@dougnukem thanks a good point, however, think the current proposal still works, because when setting something like in the config, it would apply to all processes

process.labels = [ 'user-id': params.user_id, 'project-id': params.project_id ]

pditommaso avatar May 26 '22 16:05 pditommaso

I was looking how to set labels with Google batch but cannot find it, but I'm quite sure it's possible. @bentsherman any clue about that?

pditommaso avatar Aug 31 '22 08:08 pditommaso

@jorgeaguileraseqera the google batch labels are set through the AllocationPolicy: https://cloud.google.com/java/docs/reference/google-cloud-batch/latest/com.google.cloud.batch.v1.AllocationPolicy.Builder#com_google_cloud_batch_v1_AllocationPolicy_Builder_putAllLabels_java_util_Map_java_lang_String_java_lang_String__

bentsherman avatar Aug 31 '22 14:08 bentsherman

Still could not figure out how to use Google Batch API to setup labels. I've open a separate issue for that

pditommaso avatar Sep 01 '22 09:09 pditommaso

Still could not figure out how to use Google Batch API to setup labels. I've open a separate issue for that

Do you have an issue for this?

beichen1024 avatar Sep 21 '22 18:09 beichen1024

Do you have an issue for this?

Fixed in #3170

bentsherman avatar Sep 21 '22 18:09 bentsherman

Do you have an issue for this?

Fixed in #3170

Actually, yes. NF version: 22.09.4-edge executor: google-lifesciences I added

process.labels = [ 'user-id': params.user_id, 'project-id': params.project_id ]

as mentioned in the discussion. And I used gcloud beta lifesciences operations describe to check the worker process operation, and in the resources section, I am not seeing the labels I added. And nothing showed up in .nextflow.log either.

resources:
      regions:
      - us-central1
      virtualMachine:
        bootDiskSizeGb: 500
        bootImage: projects/cos-cloud/global/images/family/cos-stable
        disks:
        - name: nf-pipeline-work
        labels:
          goog-pipelines-worker: 'true'  // only the google default labels
        machineType: custom-1-1024
        nvidiaDriverVersion: 450.51.06
        serviceAccount:
          email: default
          scopes:
          - https://www.googleapis.com/auth/cloud-platform
        volumes:
        - persistentDisk:
            sizeGb: 500
          volume: nf-pipeline-work
    timeout: 604800s

I think I have a couple of questions:

  1. If it is successfully passed to all process, how do I know? where should I check it? (like what log files)
  2. Are those labels suppose to show up in the resource.labels section?

Additional information: If I include a script block in the process itself and use $task.labels I can print those values to an output file.

beichen1024 avatar Sep 22 '22 18:09 beichen1024

@beichen1024 it should be process.resourceLabels

bentsherman avatar Sep 22 '22 19:09 bentsherman

@bentsherman The GCP labeling being added appears to be done at the Google Lifesciences Pipeline level, for this to apply to the GCP VMs for cost monitoring this needs to be done at the VirtualMachine.labels level.

https://github.com/nextflow-io/nextflow/pull/2853/files#diff-bd0a55afab4a626d773cac6787aef85e2d2756a3a25d438df70bb98ae0127f1fR479

I believe instead this should be applied instead or in addition at the Resource/VirtualMachine level.

https://cloud.google.com/life-sciences/docs/reference/rpc/google.cloud.lifesciences.v2beta#runpipelinerequest

RunPipelineRequest labels | map<string, string>User-defined labels to associate with the returned operation. These labels are not propagated to any Google Cloud Platform resources used by the operation, and can be modified at any time.To associate labels with resources created while executing the operation, see the appropriate resource message (for example, VirtualMachine).

https://cloud.google.com/life-sciences/docs/reference/rpc/google.cloud.lifesciences.v2beta#virtualmachine

VirtualMachine labels | map<string, string>Optional set of labels to apply to the VM and any attached disk resources. These labels must adhere to the name and value restrictions on VM labels imposed by Compute Engine.Labels keys with the prefix 'google-' are reserved for use by Google.Labels applied at creation time to the VM. Applied on a best-effort basis to attached disk resources shortly after VM creation.

Optional set of labels to apply to the VM and any attached disk resources. These labels must adhere to the name and value restrictions on VM labels imposed by Compute Engine.

Labels keys with the prefix 'google-' are reserved for use by Google.

Labels applied at creation time to the VM. Applied on a best-effort basis to attached disk resources shortly after VM creation.

dougnukem avatar Sep 23 '22 00:09 dougnukem