fastp icon indicating copy to clipboard operation
fastp copied to clipboard

split_by_lines Generates Non-Consecutive Segment Numbers and Not Documented

Open DarioS opened this issue 4 years ago • 0 comments

I noticed that --split_by_lines sometimes skips segments. Notice that segment 9 is missing in the example below.

$ ls Fastq_split/*SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003*R1*
Fastq_split/1.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz   Fastq_split/4.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz
Fastq_split/10.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz  Fastq_split/5.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz
Fastq_split/11.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz  Fastq_split/6.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz
Fastq_split/12.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz  Fastq_split/7.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz
Fastq_split/2.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz   Fastq_split/8.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz
Fastq_split/3.SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz

But, the number of lines is the same in the input file and the split files.

$ zcat Fastq_split/*SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003*R1* | wc -l
1596040040
$ zcat Fastq/SP19_008632BD_HLKNYDSXY_AGCCTCAT-TCTCTACT_L003_R1.fastq.gz | wc -l
1596040040

Please document this quirk in the user guide because I terminated my analysis when I saw this. I used -S 200000000.

DarioS avatar Mar 10 '21 13:03 DarioS