dorado
dorado copied to clipboard
Request: Optional RG tag in FASTQ
Hi devlopers of Dorado,
First, thank you for providing this great software.
I noticed that after issue #532 , RG tag and some other tag is automatically added to the FASTQ header, which I think is a bit irrational for
- The tag, especially
RGtag is long, which will take up more disk space and bad for device likeMk1Cwith increased IO burden. If I want the full info to be well documented,BAMis absolutely better. - If I basecalled my own data with
--emit-fastq(which is not recommend), I would know the model in Stdout or I manually set it, so extra recording won't help. - If I upload to SRA database to share my data, SRA will re-encode my header after
fastq-dump, and the original long header will be useless.
I fully recognize the previous demands like minimap2 -y, but I think that is minor for the vast majority, since dorado can do the alignment. If I want fastq format, I will want the header to be neat and fast, so I suggest maybe make this function to be optional or leave this function to other 3rd party software.
I'll bring this up, we'll discuss this change and I'll get back to you.
Thanks, Rich
Hi @HalfPhoton Really appreciate your attention!
Hi @Mon3trK,
Thank you for your patience. After some consideration, we’ve decided not to add this feature at this time. Our reasoning is as follows:
- For users where file size is a primary concern, BAM files remain the recommended option, as you noted.
- The additional information contained in FASTQ files is valuable for downstream processes (e.g., demultiplexing).
- Introducing further configuration options in Dorado would add complexity.
- Users who wish to remove this information from their FASTQ files can do so easily.
Kind regards, Rich