gatk icon indicating copy to clipboard operation
gatk copied to clipboard

DownsampleSam discards NM tag

Open bw2 opened this issue 1 year ago • 1 comments

Affected tool(s) or class(es)

gatk DownsampleSam

Affected version(s)

GATK v4.3.0.0

Description

Input cram file (gs://broad-public-datasets/CHM1_CHM13_WGS2/CHM1_CHM13_WGS2.cram) has NM tags, but the downsampled output file no longer has them. My command-line is

gatk DownsampleSam REFERENCE_SEQUENCE=/hg38.fa I=CHM1_CHM13_WGS2.cram P=0.5 CREATE_INDEX=true O=CHM1_CHM13_WGS2.downsampled.bam 

Some downstream tools require NM tags, so I have to run

samtools calmd CHM1_CHM13_WGS2.downsampled.bam /hg38.fa

to re-add it.

bw2 avatar Oct 23 '23 05:10 bw2

@bw2 For better or worse, htsjdk tries to maintain round-trip fidelity for CRAMs. I took a look at the first few slices of the CRAM referenced above, and it does not appear to contain NM or MD tags. Can you let me know how you concluded that it does ?

cmnbroad avatar Oct 31 '23 18:10 cmnbroad