fastq-tools icon indicating copy to clipboard operation
fastq-tools copied to clipboard

Error: Input files have differing numbers of entries (1882 != 1677)

Open Rohit-Satyam opened this issue 1 year ago • 1 comments

I was trying to use fastq-sample but I keep on getting the error:

Input files have differing numbers of entries (1882 != 1677)
fastq-sample -n 100 -o sampled_60 1107_S34_L001_R1.fastq.gz 1107_S34_L001_R2.fastq.gz

I have attached the files below.

1107_S34_L001_R2.fastq.gz 1107_S34_L001_R1.fastq.gz

Rohit-Satyam avatar Sep 03 '23 07:09 Rohit-Satyam

The issue is resolved by unzipping the fastq files. Unzipping is feasible for small data but I have actual files (3GB-9GB zipped). Is there a way to make fastq-sample handle gzipped file.

~Besides, the proportion option seems not to be working since total number of reads stays the same in sampled file~

## Want to sample 60% of total reads
fastq-sample -p 60 -o sampled_60 -s 1234 1107_S34_L001_R1.fastq 1107_S34_L001_R2.fastq

wc -l sampled_60.1.fastq
146648 sampled_60.1.fastq

wc -l 1107_S34_L001_R1.fastq
146648 1107_S34_L001_R1.fastq

Edit1: Sorry I realized I have to give fraction value. When I provide fraction in zipped format, fastq-sample counts the total number of reads as wc -l file.gz rather than zcat file.gz | wc -l this leas to wrong estimate of reads to be dumped.

Note: I am using v0.8.3 from Bioconda

Rohit-Satyam avatar Sep 03 '23 08:09 Rohit-Satyam