kingfisher-download icon indicating copy to clipboard operation
kingfisher-download copied to clipboard

Unexpected behavior when downloading fastq using SRA identifier

Open jolespin opened this issue 6 months ago • 3 comments

https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR13615821&display=metadata image

I ran kingfisher and it pulled 3 fastq files for 1 record. A single ended and 2 paired end files.

(base) [jespinoz@exp-15-28 split_reads]$ kingfisher --version
0.3.1

ID=SRR13615821
kingfisher get -r ${ID} -m aws-http -f fastq.gz

I thought that maybe one was interleaved but the read sizes didn't match up:

(base) [jespinoz@exp-15-28 Fastq]$ seqkit stats SRR13615821_1.fastq.gz SRR13615821_2.fastq.gz split_reads/SRR13615821.fastq.gz
processed files:  3 / 3 [======================================] ETA: 0s. done
file                              format  type   num_seqs        sum_len  min_len  avg_len  max_len
SRR13615821_1.fastq.gz            FASTQ   DNA     808,228    197,172,014       35      244      301
SRR13615821_2.fastq.gz            FASTQ   DNA     808,228    199,461,172       21    246.8      301
split_reads/SRR13615821.fastq.gz  FASTQ   DNA   5,860,790  1,438,979,322       35    245.5      301

The above files were what were downloaded by kingfisher.

Note: I moved SRR13615821.fastq.gz into a separate folder to split the reads but BBSuite said there were no pairs:

base) [jespinoz@exp-15-28 split_reads]$ repair.sh in=SRR13615821.fastq.gz out1=SRR13615821_1.fastq.gz out2=SRR13615821_2.fastq.gz
java -ea -Xmx84979m -cp /expanse/projects/jcl110/miniconda3/opt/bbmap-39.01-1/current/ jgi.SplitPairsAndSingles rp in=SRR13615821.fastq.gz out1=SRR13615821_1.fastq.gz out2=SRR13615821_2.fastq.gz
Executing jgi.SplitPairsAndSingles [rp, in=SRR13615821.fastq.gz, out1=SRR13615821_1.fastq.gz, out2=SRR13615821_2.fastq.gz]

Set INTERLEAVED to false
Started output stream.

Input:                  	5860790 reads 		1438979322 bases.
Result:                 	5860790 reads (100.00%) 	1438979322 bases (100.00%)
Pairs:                  	0 reads (0.00%) 	0 bases (0.00%)
Singletons:             	5860790 reads (100.00%) 	1438979322 bases (100.00%)

Time:                         	36.897 seconds.
Reads Processed:       5860k 	158.84k reads/sec
Bases Processed:       1438m 	39.00m bases/sec

The above is me trying to split the reads manually.

Do you know what could be happening?

jolespin avatar Dec 13 '23 19:12 jolespin