minimap2
minimap2 copied to clipboard
Potential duplicate generation of RG tag on inputs with RG information
Hi Heng,
This isn't necessarily a bug, but I was a bit surprised. Also, this definitely is not a high-impact issue.
So, this is arguably an edge case. When one has an input where each read has its associated RG (readgroup) information, that could be duplicated.
Here's an example.
Say the input is an unaligned BAM that has the RG tag for all its reads (with other tags like 5mC calls), one would run the command like the following
samtools fastq -t -T MM,ML <input_ubam> \
| minimap -ayYL -x <preset> -R "@RG\ID:matching_readgroup_id..." <ref> - \
| samtools sort -o output.bam
This will create two RG tags for each read.
Of course, this can be averted without the -t
flag in samtools fastq
.
But the documentation of samtools fastq
says it'll copy not only RG, but also BC and QT tags, so one could still want to keep that flag.
Alternatively, one can skip specifying the readgroup info for minimap2
, and later add that by samtools reheader
but this is extra work.
So, a convenient feature would be for minimap2
to check if the "comments" that would be copied from the input FASTQ come with RG. And if so, don't write that again based on the information provided via -R "@RG\ID:matching_readgroup_id..."
.
Thanks, Steve