fastq-dl icon indicating copy to clipboard operation
fastq-dl copied to clipboard

prefer sra normalized format over sra lite

Open kapsakcj opened this issue 1 year ago • 7 comments

I've hit an odd issue where fastq-dl pulls FASTQs without issue, but they are in SRA Lite format instead of the typical SRA Normalized format.

FASTQs in SRA Lite format have ? for all Qscores for all bases, which equates to Q30. This leads to issues where trimmomatic or other typical downstream softwares are unable to detect the Phred quality encoding and the Qscore are not useful during assembly (and probably other applications that utilize the Qscores)

FASTQs in SRA Normalized are the original format that contains the full base quality scores

Some examples where I encountered this issue

  • SRR25316086: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR25316086&display=metadata
  • SRR13086318: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR13086318&display=metadata

I'm guessing it will be a big effort, but would it be possible for fastq-dl to download the SRA-normalized format of FASTQs?

Not sure how ENA deals with this issue, but sra-toolkit has an option for using this format

More info:

  • https://ncbiinsights.ncbi.nlm.nih.gov/2021/10/19/sra-lite/
  • https://www.ncbi.nlm.nih.gov/sra/docs/sra-data-formats/

kapsakcj avatar Aug 02 '23 20:08 kapsakcj