fastq-dl
fastq-dl copied to clipboard
prefer sra normalized format over sra lite
I've hit an odd issue where fastq-dl
pulls FASTQs without issue, but they are in SRA Lite format instead of the typical SRA Normalized format.
FASTQs in SRA Lite format have ?
for all Qscores for all bases, which equates to Q30. This leads to issues where trimmomatic
or other typical downstream softwares are unable to detect the Phred quality encoding and the Qscore are not useful during assembly (and probably other applications that utilize the Qscores)
FASTQs in SRA Normalized are the original format that contains the full base quality scores
Some examples where I encountered this issue
- SRR25316086: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR25316086&display=metadata
- SRR13086318: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR13086318&display=metadata
I'm guessing it will be a big effort, but would it be possible for fastq-dl
to download the SRA-normalized format of FASTQs?
Not sure how ENA deals with this issue, but sra-toolkit has an option for using this format
More info:
- https://ncbiinsights.ncbi.nlm.nih.gov/2021/10/19/sra-lite/
- https://www.ncbi.nlm.nih.gov/sra/docs/sra-data-formats/