seqtk icon indicating copy to clipboard operation
seqtk copied to clipboard

Ability to process quality (.qual) files

Open deprekate opened this issue 10 years ago • 1 comments

It would be nice to have seqtk to be able to process quality files. Quality files are in the same format as FASTA files, except they have a space separating their quality scores. If seqtk is used to process any quality files, it will wrap the lines and merge two quality score values.

[deprekate@anthill ~]$ cat seq.fna 
>seq1
CCGAATGGATCATCCCGACTTTCAGGCCGGGATGGCCGGCCTGAAAGGGGACTGGGAACT
CCTCTGCCGCCCCTTGTGCGACCCGGATGCCCCGCGCGGCTGGCTGGGGGTCTGGGCGCT
[deprekate@anthill ~]$ cat seq.qual
>seq1
35 35 50 50 44 44 44 43 44 43 55 55 55 44 50 42 42 42 42 43 52 52 52 52 52
52 42 52 52 44 52 39 39 40 43 43 55 44 52 42 42 42 42 42 52 55 55 55 55 55

subseq shows the bug, The sampled nucleotide sequence has 120 bases, while the quality score has 119 qualities. You can see the 5252 in the middle of the quality that got merged.

[deprekate@anthill ~]$ seqtk sample seq.fna 1
>seq1
CCGAATGGATCATCCCGACTTTCAGGCCGGGATGGCCGGCCTGAAAGGGGACTGGGAACTCCTCTGCCGCCCCTTGTGCGACCCGGATGCCCCGCGCGGCTGGCTGGGGGTCTGGGCGCT
[deprekate@anthill ~]$ seqtk sample seq.qual 1
>seq1
35 35 50 50 44 44 44 43 44 43 55 55 55 44 50 42 42 42 42 43 52 52 52 52 5252 42 52 52 44 52 39 39 40 43 43 55 44 52 42 42 42 42 42 52 55 55 55 55 55

An easy fix would be to have the option to replace newlines with spaces instead of just removing them entirely?

deprekate avatar Oct 01 '14 22:10 deprekate

I would just use bioperl/biopython to convert all legacy FASTA+QUAL files to FASTQ.

tseemann avatar Aug 18 '16 17:08 tseemann