hostile icon indicating copy to clipboard operation
hostile copied to clipboard

Can Hostile Output Full FASTQ Headers for Medaka Compatibility?

Open xiaoli-dong opened this issue 8 months ago • 3 comments

We are using Hostile for dehosting across several projects. Recently, we encountered an issue with medaka_consensus version 2.0.1.

Upon investigation, I found that medaka_consensus relies on the basecalling model information embedded in the FASTQ headers. However, the FASTQ files produced by Hostile during dehosting seem to strip this information, retaining only the read IDs.

For example, a typical original header looks like this: @d776c6f5-9501-41e3-8631-4966c9c35566 runid=1a64aa91730686f5bb6ec4c17cbd38ed80b8e9dd sampleid=no_sample read=20450 ch=259 start_time=2023-03-07T13:23:06Z [email protected] barcode=barcode14

Is it possible for Hostile to retain the full original header in the dehosted FASTQ output, rather than outputting only the read IDs? This would ensure compatibility with tools like Medaka that rely on full header metadata.

xiaoli-dong avatar Jul 09 '25 20:07 xiaoli-dong

Hi Xiaoli, thanks for raising this, I'll look into it and keep you posted.

bede avatar Jul 10 '25 10:07 bede

typo in the post, it is medaka 2.1.0

xiaoli-dong avatar Jul 10 '25 17:07 xiaoli-dong

Hi @xiaoli-dong, apologies for delay. Thanks for suggesting this useful feature. Frustratingly I've yet to find an elegant and performant solution for passing full header information through Hostile's internal pipeline. I hope to be able to implement it one way or another (suggestions welcome) but for now I would suggest you consider my new tool Deacon, which passes through complete headers by default.

https://github.com/bede/deacon

Bede

bede avatar Aug 12 '25 14:08 bede