nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Unhelpful error message when service account is missing access to resource on GCP

Open Puumanamana opened this issue 9 months ago • 0 comments

Bug report

On google-batch, when using a different service account than the default one, it seems both the default service account (scheduling the jobs) and the runner service account (running the jobs) need to have read access to the input files. When the default service account doesn't have access to them, I get a helpful warning message (maybe it should be an error but regardless, there is some info):

WARN: Unable to get file attributes file: gs://rome-aws/file.txt -- Cause: com.google.cloud.storage.StorageException: [email protected] does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).

However, when I fix this issue with the default service account but not for the runner service account, the task is scheduled and I get a very generic error:

Caused by:
  Process `LS` terminated for an unknown reason -- Likely it has been terminated by the external system

Expected behavior and actual behavior

Actual behavior: Generic error message Expected behavior: More information about missing permission for selected service account

Steps to reproduce the problem

main.nf

process LS {
    container "ubuntu:latest"

    input:
    path f

    script:
    """
    ls .
    """
}

workflow {
    LS(
        file("gs://rome-aws/file.txt")
    )
}

nextflow.config

workDir = "gs://nxf-work/scratch/$USER"

google {
    project = "rome-pipeline-engine"
    region = "us-central1"
    batch {
        serviceAccountEmail = "[email protected]"
    }
}

process {
    executor = "google-batch"
}

Program output

Sep-28 16:08:54.615 [main] DEBUG nextflow.cli.Launcher - $> nextflow run main.nf
Sep-28 16:08:54.739 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 23.09.1-edge
Sep-28 16:08:54.764 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/cedric/.nextflow/plugins; core-plugins: [email protected],[email protected],[email protected],[email protected],[email protected],nf-ga4
[email protected],[email protected],[email protected],[email protected]
Sep-28 16:08:54.776 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Sep-28 16:08:54.778 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Sep-28 16:08:54.782 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Sep-28 16:08:54.794 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
Sep-28 16:08:54.816 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /home/cedric/sandbox/nf-permissions-no-error/nextflow.config
Sep-28 16:08:54.818 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/cedric/sandbox/nf-permissions-no-error/nextflow.config
Sep-28 16:08:54.843 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Sep-28 16:08:55.632 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 by global default
Sep-28 16:08:55.650 [main] INFO  nextflow.cli.CmdRun - Launching `main.nf` [jolly_moriondo] DSL2 - revision: 305a1d98cd
Sep-28 16:08:55.651 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[[email protected]]
Sep-28 16:08:55.651 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[[email protected]]
Sep-28 16:08:55.652 [main] DEBUG nextflow.plugin.PluginUpdater - Installing plugin nf-google version: 1.8.1
Sep-28 16:08:55.664 [main] INFO  org.pf4j.AbstractPluginManager - Plugin '[email protected]' resolved
Sep-28 16:08:55.665 [main] INFO  org.pf4j.AbstractPluginManager - Start plugin '[email protected]'
Sep-28 16:08:55.716 [main] DEBUG nextflow.plugin.BasePlugin - Plugin started [email protected]
Sep-28 16:08:55.733 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /home/cedric/.nextflow/secrets/store.json
Sep-28 16:08:55.737 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@1b1c538d] - activable => nextflow.secret.LocalSecretsProvider@1b1c538d
Sep-28 16:08:55.823 [main] DEBUG nextflow.Session - Session UUID: 5a75fefa-5344-456a-a459-78d69890bfed
Sep-28 16:08:55.823 [main] DEBUG nextflow.Session - Run name: jolly_moriondo
Sep-28 16:08:55.824 [main] DEBUG nextflow.Session - Executor pool size: 8
Sep-28 16:08:56.059 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
Sep-28 16:08:56.066 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=24; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Sep-28 16:08:56.105 [main] DEBUG nextflow.cli.CmdRun -
  Version: 23.09.1-edge build 5881
  Created: 11-09-2023 10:08 UTC
  System: Linux 5.15.0-1027-gcp
  Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 17.0.3-internal+0-adhoc..src
  Encoding: UTF-8 (UTF-8)
  Process: 1032796@nf-tower-main [10.128.0.2]
  CPUs: 8 - Mem: 62.8 GB (51.9 GB) - Swap: 0 (0)
Sep-28 16:08:56.150 [main] DEBUG nextflow.file.FileHelper - Can't check if specified path is NFS (1): gs://nxf-work/scratch/cedric

Sep-28 16:08:56.150 [main] DEBUG nextflow.Session - Work-dir: gs://nxf-work/scratch/cedric [null]
Sep-28 16:08:56.150 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/cedric/sandbox/nf-permissions-no-error/bin
Sep-28 16:08:56.181 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[GoogleLifeSciencesExecutor, GoogleBatchExecutor]
Sep-28 16:08:56.197 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory                                                                                                                                                                                    Sep-28 16:08:56.229 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
Sep-28 16:08:56.241 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 9; maxThreads: 1000
Sep-28 16:08:56.338 [main] DEBUG nextflow.Session - Session start
Sep-28 16:08:56.577 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Sep-28 16:08:56.713 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: google-batch
Sep-28 16:08:56.713 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'google-batch'
Sep-28 16:08:56.719 [main] DEBUG nextflow.executor.Executor - [warm up] executor > google-batch
Sep-28 16:08:56.730 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'google-batch' > capacity: 1000; pollInterval: 10s; dumpInterval: 5m
Sep-28 16:08:56.733 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: google-batch)
Sep-28 16:08:56.739 [main] DEBUG nextflow.cloud.google.GoogleOpts - Google auth via application DEFAULT
Sep-28 16:08:56.747 [main] DEBUG n.c.google.batch.GoogleBatchExecutor - [GOOGLE BATCH] Executor config=BatchConfig[googleOpts=GoogleOpts(projectId:rome-pipeline-engine, credsFile:null, location:null, enableRequesterPaysBuckets:false, httpConnectTimeout:1m, httpReadTimeout
:1m, credentials:ComputeEngineCredentials{transportFactoryClassName=com.google.auth.oauth2.OAuth2Utils$DefaultHttpTransportFactory})
Sep-28 16:08:56.767 [main] DEBUG n.c.google.batch.client.BatchClient - [GOOGLE BATCH] Creating service client with config credentials
Sep-28 16:08:57.521 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: LS
Sep-28 16:08:57.521 [main] DEBUG nextflow.Session - Igniting dataflow network (1)
Sep-28 16:08:57.522 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > LS
Sep-28 16:08:57.539 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files:
  Script_ec74d6c2870f9fc7: /home/cedric/sandbox/nf-permissions-no-error/main.nf
Sep-28 16:08:57.539 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination
Sep-28 16:08:57.539 [main] DEBUG nextflow.Session - Session await
Sep-28 16:09:02.656 [Task submitter] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Process `LS` submitted > job=nf-bc268099-1695917338257; uid=nf-bc268099-169591-5b659bb3-ef3f-45f70; work-dir=gs://nxf-work/scratch/cedric/bc/2680992b7f39a15441ad007bb00cc9
Sep-28 16:09:02.656 [Task submitter] INFO  nextflow.Session - [bc/268099] Submitted process > LS
Sep-28 16:11:06.783 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Process `LS` - terminated job=nf-bc268099-1695917338257; state=FAILED
Sep-28 16:11:06.843 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Cannot read exit status for task: `LS` - gs://nxf-work/scratch/cedric/bc/2680992b7f39a15441ad007bb00cc9/.exitcode
Sep-28 16:11:07.633 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: LS; status: COMPLETED; exit: -; error: -; workDir: gs://nxf-work/scratch/cedric/bc/2680992b7f39a15441ad007bb00cc9]
Sep-28 16:11:07.638 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=LS; work-dir=gs://nxf-work/scratch/cedric/bc/2680992b7f39a15441ad007bb00cc9
  error [nextflow.exception.ProcessFailedException]: Process `LS` terminated for an unknown reason -- Likely it has been terminated by the external system
Sep-28 16:11:07.671 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump output of process 'null' -- Cause: java.nio.file.NoSuchFileException: gs://nxf-work/scratch/cedric/bc/2680992b7f39a15441ad007bb00cc9/.command.out
Sep-28 16:11:07.700 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'null' -- Cause: java.nio.file.NoSuchFileException: gs://nxf-work/scratch/cedric/bc/2680992b7f39a15441ad007bb00cc9/.command.err

Caused by:
  Process `LS` terminated for an unknown reason -- Likely it has been terminated by the external system

Command executed:

  ls .

Command exit status:
  -

Command output:
  (empty)

Work dir:
  gs://nxf-work/scratch/cedric/bc/2680992b7f39a15441ad007bb00cc9

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Sep-28 16:11:07.713 [main] DEBUG nextflow.Session - Session await > all processes finished
Sep-28 16:11:07.746 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `LS` terminated for an unknown reason -- Likely it has been terminated by the external system
Sep-28 16:11:07.797 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'null' -- Cause: java.nio.file.NoSuchFileException: gs://nxf-work/scratch/cedric/bc/2680992b7f39a15441ad007bb00cc9/.command.err
Sep-28 16:11:07.822 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump output of process 'null' -- Cause: java.nio.file.NoSuchFileException: gs://nxf-work/scratch/cedric/bc/2680992b7f39a15441ad007bb00cc9/.command.out
Sep-28 16:11:07.823 [main] DEBUG nextflow.Session - Session await > all barriers passed
Sep-28 16:11:07.824 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: google-batch) - terminating tasks monitor poll loop
Sep-28 16:11:07.851 [main] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'null' -- Cause: java.nio.file.NoSuchFileException: gs://nxf-work/scratch/cedric/bc/2680992b7f39a15441ad007bb00cc9/.command.err
Sep-28 16:11:07.879 [main] DEBUG nextflow.processor.TaskRun - Unable to dump output of process 'null' -- Cause: java.nio.file.NoSuchFileException: gs://nxf-work/scratch/cedric/bc/2680992b7f39a15441ad007bb00cc9/.command.out
Sep-28 16:11:07.887 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; faile
dDuration=851ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ]
Sep-28 16:11:07.945 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Sep-28 16:11:07.945 [main] INFO  org.pf4j.AbstractPluginManager - Stop plugin '[email protected]'
Sep-28 16:11:07.945 [main] DEBUG nextflow.plugin.BasePlugin - Plugin stopped nf-google
Sep-28 16:11:07.963 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Environment

  • Nextflow version: 23.09.1-edge build 5881
  • Java version: openjdk version "17.0.3-internal" 2022-04-19
  • Operating system: Ubuntu 20.04.5 LTS
  • Bash version: zsh 5.8 (x86_64-ubuntu-linux-gnu)

Puumanamana avatar Sep 28 '23 16:09 Puumanamana