fqtools icon indicating copy to clipboard operation
fqtools copied to clipboard

`fqtools type` incorrectly typing fastq

Open nick-youngblut opened this issue 6 years ago • 5 comments

It appears that the current algorithm for fqtools type can get the fastq quality format wrong. Here's a reproducible example:

fastq-dump --split-files ERR719681
# Read 300355 spots for ERR719681
# Written 300355 spots for ERR719681
fqtools type ERR719681_1.fastq
# fastq-illumina
fqtools type ERR719681_2.fastq
# fastq-sanger

If Bio.SeqIO is then used to read these fastq files with the "type" specified by fqtools type, then the following error occurs:

  File "/ebio/abt3_projects/software/dev/llmgqc/.snakemake/conda/a289c738/lib/python3.6/site-packages/Bio/SeqIO/__init__.py", line 611, in parse
    for r in i:
  File "/ebio/abt3_projects/software/dev/llmgqc/.snakemake/conda/a289c738/lib/python3.6/site-packages/Bio/SeqIO/QualityIO.py", line 1255, in FastqIlluminaIterator
    raise ValueError("Invalid character in quality string")

Maybe using the min & max of qual values (the full range) for all sequences in the fastq file would help prevent these mis-calls?

nick-youngblut avatar Jun 18 '18 19:06 nick-youngblut

Development on fqtools seems to have stopped. Anyone know of a good alternative tool for typing fastq files?

nick-youngblut avatar Jul 19 '18 20:07 nick-youngblut

Yes it has stopped for now, as I’ve got to do my main job :-)

What do you mean by typing fastq?

Alastair On 19 Jul 2018, 21:03 +0100, Nick Youngblut [email protected], wrote:

Development on fqtools seems to have stopped. Anyone know of a good alternative tool for typing fastq files? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

alastair-droop avatar Jul 19 '18 20:07 alastair-droop

Yeah, main jobs do tend to get in the way :)

fqtools type doesn't always type correctly, at least as needed for correct conversion of fastq formats with Bio.SeqIO. I'm probably just going to add a function to my fastq format conversion script that will first type the input format as designated by fqtools type, and then if that doesn't work, just try all fastq input formats until one works.

nick-youngblut avatar Jul 19 '18 20:07 nick-youngblut

I’ve had a brief look, but the detection algorithm is doing what I told it to. This indicates that my logic is wrong.

I’ll try to sort this when I’m back at the office; mid August.

Thanks, Alastair On 19 Jul 2018, 21:38 +0100, Nick Youngblut [email protected], wrote:

Yeah, main jobs do tend to get in the way :) fqtools type doesn't always type correctly, at least as needed for correct conversion of fastq formats with Bio.SeqIO. I'm probably just going to add a function to my fastq format conversion script that will first type the input format as designated by fqtools type, and then if that doesn't work, just try all fastq input formats until one works. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

alastair-droop avatar Jul 21 '18 08:07 alastair-droop

Great! Here's an example of different classifications for read1 vs read2:

fastq-dump --skip-technical --split-3 ERR866627

fqtools type ERR866627_1.fastq    
# fqtools type: fastq-sanger
fqtools type ERR866627_2.fastq
# fqtools type: fastq-solexa

nick-youngblut avatar Jul 21 '18 09:07 nick-youngblut