fastq-tools
fastq-tools copied to clipboard
Error: Input files have differing numbers of entries (1882 != 1677)
I was trying to use fastq-sample
but I keep on getting the error:
Input files have differing numbers of entries (1882 != 1677)
fastq-sample -n 100 -o sampled_60 1107_S34_L001_R1.fastq.gz 1107_S34_L001_R2.fastq.gz
I have attached the files below.
The issue is resolved by unzipping the fastq files. Unzipping is feasible for small data but I have actual files (3GB-9GB zipped). Is there a way to make fastq-sample
handle gzipped file.
~Besides, the proportion option seems not to be working since total number of reads stays the same in sampled file~
## Want to sample 60% of total reads
fastq-sample -p 60 -o sampled_60 -s 1234 1107_S34_L001_R1.fastq 1107_S34_L001_R2.fastq
wc -l sampled_60.1.fastq
146648 sampled_60.1.fastq
wc -l 1107_S34_L001_R1.fastq
146648 1107_S34_L001_R1.fastq
Edit1: Sorry I realized I have to give fraction value. When I provide fraction in zipped format, fastq-sample counts the total number of reads as wc -l file.gz
rather than zcat file.gz | wc -l
this leas to wrong estimate of reads to be dumped.
Note: I am using v0.8.3 from Bioconda