dorado icon indicating copy to clipboard operation
dorado copied to clipboard

Request for fastq.gz read ID format change

Open andyjslee opened this issue 1 year ago • 2 comments

In the fastq.gz that Dorado outputs, I see that a semicolon is sometimes included as part of the read ID. A semicolon is used to separate attribute-value pairs in the VCF format. Could you please use a different character (e.g. forward slash "/") in the read IDs? Thank you.

andyjslee avatar Jul 05 '24 19:07 andyjslee

Hi @andyjslee - This sounds like you're using duplex basecalled reads which write the read_ids as idA;idB - is this correct?

I'll discuss this issue internally and we'll get back to you with a decision on how we will proceed.

Kind regards, Rich

HalfPhoton avatar Jul 10 '24 13:07 HalfPhoton

@HalfPhoton yes, that's correct. Thank you!

andyjslee avatar Jul 10 '24 13:07 andyjslee

Hi @andyjslee,

Apologies for the delayed reply, and thank you for your feature request.

After reviewing this, we won’t be moving forward with this change. The current format is required by some downstream applications, which now rely on it as part of their expected input.

Best regards, Rich

HalfPhoton avatar Mar 21 '25 15:03 HalfPhoton