Exitcode being ignored in k8s in some cases
Bug report
As the title says
Expected behavior and actual behavior
Currently a process can exit with a non-zero exit code (as stored in the .exitcode file) but the executor still calls it a 0 exit and looks for outputs.
Steps to reproduce the problem
I think this is an interaction between #6484 and #6597 where the resources are cleaned up by the garbage collector meaning that nextflow cannot get the exit code from the job but rather than falling back to checking the .exitcode file it is just falling back to 0 which is obviously a problem.
Steps to reproduce:
- Start a k8s run with a short TTL (e.g. 60s) and many processes
- Ensure at least some processes error out non-zero
Environment
- Nextflow version: 25.11.0-edge
- Java version: openjdk 17.0.16 2025-07-15
- Operating system: Linux
- Bash version: GNU bash, version 5.1.16(1)
Looking again at #6484 , it should fall back to the .exitcode file if the exit code couldn't be retrieved from the K8s API. And if the exit file is missing it will fall back to Integer.MAX_VALUE. So I don't see how it could fall back to 0 unless the task actually succeeded.
-
Can you check what was written to the
.exitcodefile? -
Can you confirm the problem actually goes away when you set a higher TTL?
From the error message it seems pretty clear that the task returned 3 and Nextflow somehow ended up with 0...
Copying the relevant error message from the log:
Dec-04 01:54:39.854 [TaskFinalizer-8] ERROR nextflow.processor.TaskProcessor - Error executing process > 'ingest:extract_all:extract_taxa:extract_taxa_paired_reads (2)'
Caused by:
Missing output file(s) `*.fastq` expected by process `ingest:extract_all:extract_taxa:extract_taxa_paired_reads (2)`
Command executed:
extract_taxa_from_reads.py -s1 687e5bb1-c101-46ca-ad75-3b211ab933c6_1.fastp.fastq.gz -s2 687e5bb1-c101-46ca-ad75-3b211ab933c6_2.fastp.fastq.gz -k PlusPF.kraken_assignments.tsv -r Metazoa.kreport_split.txt -t 2023-10-01 -p Metazoa.kreport_split.txt --include_children --min_count_descendants 100000000 --rank G --min_percent 100 --max_human 25000
PATTERN=(*.f*q)
if [ ! -f ${PATTERN[0]} ]; then
echo "Found no output files - maybe there weren't any for this sample"
exit 3
fi
Command exit status:
0
Command output:
Found no output files - maybe there weren't any for this sample
Command error:
PROGRAM START TIME: 12/04/2025, 01:32:39
Loading taxonomy
Loading kraken report
Loading kraken assignments
Identifying lists to extract
SELECTED 0 TAXA TO EXTRACT
Extracting reads from file
Reading in 687e5bb1-c101-46ca-ad75-3b211ab933c6_1.fastp.fastq.gz
Writing records for file 1
Reading in 687e5bb1-c101-46ca-ad75-3b211ab933c6_2.fastp.fastq.gz
Writing records for file 2
Write summary
PROGRAM END TIME: 12/04/2025, 01:54:20
READ COUNTS:
cc @jorgee for your thoughts
Thanks for looking at this!
The .exitcode file did have 3 in it as it should....
I'll test a higher TTL in a sec
A higher TTL didn't make a difference unfortunately....