nextflow
nextflow copied to clipboard
splitFastq does not split beyond the second file in PE mode
Bug report
(Please follow this template replacing the text between parentheses with the requested information)
Expected behavior and actual behavior
The current Nextflow docs for splitFastq
states:
Finally the
splitFastq
operator is able to split paired-end read pair FASTQ files. It must be applied to a channel which emits tuples containing at least two elements that are the files to be split.
while the description for the pe
argument states:
When
true
splits paired-end read files, therefore items emitted by the source channel must be tuples in which at least two elements are the read-pair files to be split.
This implies when splitFastq
is used with pe: true
, it is expected to split an unlimited number of FASTQ files for each entry of the channel. However, as from the output below, only the first two files are split. This wasn't a problem (yet) in 2019, but becomes a problem now due to some single-cell sequencing platforms require 3 FASTQ files as input.
Steps to reproduce the problem
Channel
.fromFilePairs('test/test*_{R1,R2,I1}_[0-9][0-9][0-9].fastq.gz', size:3, flat:true)
.splitFastq(by: 10, pe:true, file:true)
.view()
Program output
N E X T F L O W ~ version 23.10.1
Launching `splitFastq_report.nf` [trusting_picasso] DSL2 - revision: 58e150738f
[test_S1_L001, /home/jma/Documents/work/35/01beb071fcd1260524d0f1b592a777/test_S1_L001_I1_001.1.fastq, /home/jma/Documents/work/00/11bcf5753d6e3968c95dc2829f7535/test_S1_L001_R1_001.1.fastq, /home/jma/Documents/test/test_S1_L001_R2_001.fastq.gz]
[test_S1_L001, /home/jma/Documents/work/35/01beb071fcd1260524d0f1b592a777/test_S1_L001_I1_001.2.fastq, /home/jma/Documents/work/00/11bcf5753d6e3968c95dc2829f7535/test_S1_L001_R1_001.2.fastq, /home/jma/Documents/test/test_S1_L001_R2_001.fastq.gz]
[test_S1_L001, /home/jma/Documents/work/35/01beb071fcd1260524d0f1b592a777/test_S1_L001_I1_001.3.fastq, /home/jma/Documents/work/00/11bcf5753d6e3968c95dc2829f7535/test_S1_L001_R1_001.3.fastq, /home/jma/Documents/test/test_S1_L001_R2_001.fastq.gz]
[test_S1_L001, /home/jma/Documents/work/35/01beb071fcd1260524d0f1b592a777/test_S1_L001_I1_001.4.fastq, /home/jma/Documents/work/00/11bcf5753d6e3968c95dc2829f7535/test_S1_L001_R1_001.4.fastq, /home/jma/Documents/test/test_S1_L001_R2_001.fastq.gz]
[test_S1_L001, /home/jma/Documents/work/35/01beb071fcd1260524d0f1b592a777/test_S1_L001_I1_001.5.fastq, /home/jma/Documents/work/00/11bcf5753d6e3968c95dc2829f7535/test_S1_L001_R1_001.5.fastq, /home/jma/Documents/test/test_S1_L001_R2_001.fastq.gz]
[test_S1_L001, /home/jma/Documents/work/35/01beb071fcd1260524d0f1b592a777/test_S1_L001_I1_001.6.fastq, /home/jma/Documents/work/00/11bcf5753d6e3968c95dc2829f7535/test_S1_L001_R1_001.6.fastq, /home/jma/Documents/test/test_S1_L001_R2_001.fastq.gz]
[test_S1_L001, /home/jma/Documents/work/35/01beb071fcd1260524d0f1b592a777/test_S1_L001_I1_001.7.fastq, /home/jma/Documents/work/00/11bcf5753d6e3968c95dc2829f7535/test_S1_L001_R1_001.7.fastq, /home/jma/Documents/test/test_S1_L001_R2_001.fastq.gz]
[test_S1_L001, /home/jma/Documents/work/35/01beb071fcd1260524d0f1b592a777/test_S1_L001_I1_001.8.fastq, /home/jma/Documents/work/00/11bcf5753d6e3968c95dc2829f7535/test_S1_L001_R1_001.8.fastq, /home/jma/Documents/test/test_S1_L001_R2_001.fastq.gz]
[test_S1_L001, /home/jma/Documents/work/35/01beb071fcd1260524d0f1b592a777/test_S1_L001_I1_001.9.fastq, /home/jma/Documents/work/00/11bcf5753d6e3968c95dc2829f7535/test_S1_L001_R1_001.9.fastq, /home/jma/Documents/test/test_S1_L001_R2_001.fastq.gz]
[test_S1_L001, /home/jma/Documents/work/35/01beb071fcd1260524d0f1b592a777/test_S1_L001_I1_001.10.fastq, /home/jma/Documents/work/00/11bcf5753d6e3968c95dc2829f7535/test_S1_L001_R1_001.10.fastq, /home/jma/Documents/test/test_S1_L001_R2_001.fastq.gz]
Environment
- Nextflow version: 23.10.1.5891
- Java version: openjdk 11.0.13 2021-10-19; OpenJDK Runtime Environment JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21); OpenJDK 64-Bit Server VM JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21, mixed mode)
- Operating system: Linux (CentOS 7)
- Bash version: 4.2.46(2)-release
Additional context
I currently think the issue is in the following code block in SplitOp.groovy
, currently in lines 92-96, which hard-codes the indices:
if( params.pe == true ) {
indexes = [-1,-2]
multiSplit = true
pairedEnd = true
}
However, a fix requires the operator to be able to read from at least one entry of the source
channel to determine indexes
. However, I don't know enough Groovy/Java to know if this is at all possible. If not, then just change the documentation.
test_S1_L001_I1_001.fastq.gz test_S1_L001_R1_001.fastq.gz test_S1_L001_R2_001.fastq.gz .nextflow.log
(EDIT: Updated with possible cause.)