fastp
fastp copied to clipboard
split_by_lines Generates Non-Consecutive Segment Numbers and Not Documented
I noticed that --split_by_lines sometimes skips segments. Notice that segment 9 is missing in the example below.
$ ls Fastq_split/*SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003*R1*
Fastq_split/1.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz Fastq_split/4.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz
Fastq_split/10.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz Fastq_split/5.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz
Fastq_split/11.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz Fastq_split/6.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz
Fastq_split/12.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz Fastq_split/7.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz
Fastq_split/2.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz Fastq_split/8.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz
Fastq_split/3.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz
But, the number of lines is the same in the input file and the split files.
$ zcat Fastq_split/*SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003*R1* | wc -l
1596040040
$ zcat Fastq/SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz | wc -l
1596040040
Please document this quirk in the user guide because I terminated my analysis when I saw this. I used -S 200000000.