nextflow
nextflow copied to clipboard
feature: add support for label/tags
Allow a new syntax in the label
directive in key-value format
label k: 'value'
label b: "${params.user}"
when this syntax is used we store them in a new Map variable so executors can access them and tag the process if present
closes #2845
Signed-off-by: Jorge Aguilera [email protected]
Didn't we agree to go with label 'name=value'
as first iteration"?
yes, but once started with the implementation I've found this syntax more powerful and wanted to propose it
(I'm still working on the issue so I can include also our first idea about name=value
)
This PR allows the following syntax:
process foo {
label 'bravo' //<--- a label as previously
label 'department=floor 3'
label region:'eu-west-1'
label region:'eu-west-1' , department:'floor 3'
}
Ok, I'll check soon 👍
This might be a different feature, but there could be value in allowing users to specify labels at the top-level like on the executor.
Then have those labels propagate down to the process
to be set. Otherwise we'd have to set those labels/params on every single process.
https://github.com/nextflow-io/nextflow/issues/2845
google {
labels = [ 'user-id': params.user_id, 'project-id': params.project_id ]
project = 'theprojectid'
zone = 'europe-west1-b'
}
@dougnukem thanks a good point, however, think the current proposal still works, because when setting something like in the config, it would apply to all processes
process.labels = [ 'user-id': params.user_id, 'project-id': params.project_id ]
I was looking how to set labels with Google batch but cannot find it, but I'm quite sure it's possible. @bentsherman any clue about that?
@jorgeaguileraseqera the google batch labels are set through the AllocationPolicy: https://cloud.google.com/java/docs/reference/google-cloud-batch/latest/com.google.cloud.batch.v1.AllocationPolicy.Builder#com_google_cloud_batch_v1_AllocationPolicy_Builder_putAllLabels_java_util_Map_java_lang_String_java_lang_String__
Still could not figure out how to use Google Batch API to setup labels. I've open a separate issue for that
Still could not figure out how to use Google Batch API to setup labels. I've open a separate issue for that
Do you have an issue for this?
Do you have an issue for this?
Fixed in #3170
Do you have an issue for this?
Fixed in #3170
Actually, yes. NF version: 22.09.4-edge executor: google-lifesciences I added
process.labels = [ 'user-id': params.user_id, 'project-id': params.project_id ]
as mentioned in the discussion. And I used gcloud beta lifesciences operations describe
to check the worker process operation, and in the resources
section, I am not seeing the labels I added. And nothing showed up in .nextflow.log either.
resources:
regions:
- us-central1
virtualMachine:
bootDiskSizeGb: 500
bootImage: projects/cos-cloud/global/images/family/cos-stable
disks:
- name: nf-pipeline-work
labels:
goog-pipelines-worker: 'true' // only the google default labels
machineType: custom-1-1024
nvidiaDriverVersion: 450.51.06
serviceAccount:
email: default
scopes:
- https://www.googleapis.com/auth/cloud-platform
volumes:
- persistentDisk:
sizeGb: 500
volume: nf-pipeline-work
timeout: 604800s
I think I have a couple of questions:
- If it is successfully passed to all process, how do I know? where should I check it? (like what log files)
- Are those labels suppose to show up in the
resource.labels
section?
Additional information:
If I include a script block in the process itself and use $task.labels
I can print those values to an output file.
@beichen1024 it should be process.resourceLabels
@bentsherman The GCP labeling being added appears to be done at the Google Lifesciences Pipeline level, for this to apply to the GCP VMs for cost monitoring this needs to be done at the VirtualMachine.labels level.
https://github.com/nextflow-io/nextflow/pull/2853/files#diff-bd0a55afab4a626d773cac6787aef85e2d2756a3a25d438df70bb98ae0127f1fR479
I believe instead this should be applied instead or in addition at the Resource/VirtualMachine level.
https://cloud.google.com/life-sciences/docs/reference/rpc/google.cloud.lifesciences.v2beta#runpipelinerequest
RunPipelineRequest labels | map<string, string>User-defined labels to associate with the returned operation. These labels are not propagated to any Google Cloud Platform resources used by the operation, and can be modified at any time.To associate labels with resources created while executing the operation, see the appropriate resource message (for example, VirtualMachine).
https://cloud.google.com/life-sciences/docs/reference/rpc/google.cloud.lifesciences.v2beta#virtualmachine
VirtualMachine labels | map<string, string>Optional set of labels to apply to the VM and any attached disk resources. These labels must adhere to the name and value restrictions on VM labels imposed by Compute Engine.Labels keys with the prefix 'google-' are reserved for use by Google.Labels applied at creation time to the VM. Applied on a best-effort basis to attached disk resources shortly after VM creation.
Optional set of labels to apply to the VM and any attached disk resources. These labels must adhere to the name and value restrictions on VM labels imposed by Compute Engine.
Labels keys with the prefix 'google-' are reserved for use by Google.
Labels applied at creation time to the VM. Applied on a best-effort basis to attached disk resources shortly after VM creation.