Can Hostile Output Full FASTQ Headers for Medaka Compatibility?
We are using Hostile for dehosting across several projects. Recently, we encountered an issue with medaka_consensus version 2.0.1.
Upon investigation, I found that medaka_consensus relies on the basecalling model information embedded in the FASTQ headers. However, the FASTQ files produced by Hostile during dehosting seem to strip this information, retaining only the read IDs.
For example, a typical original header looks like this:
@d776c6f5-9501-41e3-8631-4966c9c35566 runid=1a64aa91730686f5bb6ec4c17cbd38ed80b8e9dd sampleid=no_sample read=20450 ch=259 start_time=2023-03-07T13:23:06Z [email protected] barcode=barcode14
Is it possible for Hostile to retain the full original header in the dehosted FASTQ output, rather than outputting only the read IDs? This would ensure compatibility with tools like Medaka that rely on full header metadata.
Hi Xiaoli, thanks for raising this, I'll look into it and keep you posted.
typo in the post, it is medaka 2.1.0
Hi @xiaoli-dong, apologies for delay. Thanks for suggesting this useful feature. Frustratingly I've yet to find an elegant and performant solution for passing full header information through Hostile's internal pipeline. I hope to be able to implement it one way or another (suggestions welcome) but for now I would suggest you consider my new tool Deacon, which passes through complete headers by default.
https://github.com/bede/deacon
Bede