nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Exitcode being ignored in k8s in some cases

Open BioWilko opened this issue 3 months ago • 3 comments

Bug report

As the title says

Expected behavior and actual behavior

Currently a process can exit with a non-zero exit code (as stored in the .exitcode file) but the executor still calls it a 0 exit and looks for outputs.

Steps to reproduce the problem

I think this is an interaction between #6484 and #6597 where the resources are cleaned up by the garbage collector meaning that nextflow cannot get the exit code from the job but rather than falling back to checking the .exitcode file it is just falling back to 0 which is obviously a problem.

Steps to reproduce:

  • Start a k8s run with a short TTL (e.g. 60s) and many processes
  • Ensure at least some processes error out non-zero

nextflow (5).log

Environment

  • Nextflow version: 25.11.0-edge
  • Java version: openjdk 17.0.16 2025-07-15
  • Operating system: Linux
  • Bash version: GNU bash, version 5.1.16(1)

BioWilko avatar Dec 04 '25 08:12 BioWilko

Looking again at #6484 , it should fall back to the .exitcode file if the exit code couldn't be retrieved from the K8s API. And if the exit file is missing it will fall back to Integer.MAX_VALUE. So I don't see how it could fall back to 0 unless the task actually succeeded.

  • Can you check what was written to the .exitcode file?

  • Can you confirm the problem actually goes away when you set a higher TTL?

From the error message it seems pretty clear that the task returned 3 and Nextflow somehow ended up with 0...

Copying the relevant error message from the log:

Dec-04 01:54:39.854 [TaskFinalizer-8] ERROR nextflow.processor.TaskProcessor - Error executing process > 'ingest:extract_all:extract_taxa:extract_taxa_paired_reads (2)'

Caused by:
  Missing output file(s) `*.fastq` expected by process `ingest:extract_all:extract_taxa:extract_taxa_paired_reads (2)`


Command executed:

  extract_taxa_from_reads.py             -s1 687e5bb1-c101-46ca-ad75-3b211ab933c6_1.fastp.fastq.gz             -s2 687e5bb1-c101-46ca-ad75-3b211ab933c6_2.fastp.fastq.gz             -k PlusPF.kraken_assignments.tsv             -r Metazoa.kreport_split.txt             -t 2023-10-01             -p Metazoa.kreport_split.txt             --include_children             --min_count_descendants 100000000             --rank G             --min_percent 100              --max_human 25000
  
  PATTERN=(*.f*q)
  if [ ! -f ${PATTERN[0]} ]; then
      echo "Found no output files - maybe there weren't any for this sample"
      exit 3
  fi

Command exit status:
  0

Command output:
  Found no output files - maybe there weren't any for this sample

Command error:
  PROGRAM START TIME: 12/04/2025, 01:32:39
  Loading taxonomy
  Loading kraken report
  Loading kraken assignments
  Identifying lists to extract
  SELECTED 0 TAXA TO EXTRACT
  Extracting reads from file
  Reading in 687e5bb1-c101-46ca-ad75-3b211ab933c6_1.fastp.fastq.gz
  Writing records for file 1
  Reading in 687e5bb1-c101-46ca-ad75-3b211ab933c6_2.fastp.fastq.gz
  Writing records for file 2
  Write summary
  PROGRAM END TIME: 12/04/2025, 01:54:20
  READ COUNTS: 

cc @jorgee for your thoughts

bentsherman avatar Dec 04 '25 18:12 bentsherman

Thanks for looking at this!

The .exitcode file did have 3 in it as it should....

I'll test a higher TTL in a sec

BioWilko avatar Dec 05 '25 11:12 BioWilko

A higher TTL didn't make a difference unfortunately....

BioWilko avatar Dec 08 '25 12:12 BioWilko