nextflow
nextflow copied to clipboard
splitFastq produces wrong number of outputs
Bug report
Expected behavior and actual behavior
Given a pair of fastq files from S3 with 171390646 reads, calling fastqChannel.splitFastq(by: params.chunkSize, pe: true, file: true)
where chunkSize
is 10000000, the operator does not create the correct number of output files.
Steps to reproduce the problem
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process validate {
input:
path(fastqFiles)
output:
path("*.f*q*"), includeInputs: true, emit: fastqFiles
stdout emit: logs
shell:
'''
for f in !{fastqFiles}; do
echo "${f}:"
du -h $(realpath $f)
done
'''
}
workflow {
fastqsChannel = Channel.fromPath(params.fastqFiles)
validate(fastqsChannel)
groupedFastqs = validate.out.fastqFiles
.map {file ->
m = file =~ /.*\/([\w\d\-_]+)?[\-_]R?[1,2]/
return tuple(m[0][1], file)
}
.groupTuple()
.map { tuple(it[0], it[1][0], it[1][1]) }
chunksChannel = groupedFastqs.splitFastq(by: params.chunkSize, pe: true, file: true)
chunksChannel.subscribe { println "Created chunk ${it}"}
chunksChannel.count().view { "Created ${it} chunks" }
}
Program output
Prints Created 1 chunks
Environment
- Nextflow version: 22.09.3.edge build 5767
- Java version:
- Operating system: Linux
- Bash version: (use the command
$SHELL --version
)
Additional context
I'm using AWS batch to test since the files are too big for me to test locally.