goleft
goleft copied to clipboard
Differential header stringency depending on file format
I noticed this in I'm observing differential error checking depending on whether or not the input file is a CRAM or a BAM. The files in question have multiple sample names listed (a different one for each @RG line). When the input is a CRAM, no error is thrown. When the input is a BAM, I see: panic: bam reagroup: more than one RG for /build/test.bam
At the moment, it seems as if indexcov doesn't check CRAM headers? i.e. https://github.com/brentp/goleft/blob/master/indexcov/indexcov.go#L202-L231
I assume this error is thrown because the assumption is that there is a single sample for the whole file and there isn't handling of multiple samples. What is being reported when these problem CRAMs are provided? Stats for all the samples pooled together?
For context, I'm seeing this error when running Smoove.
yes, for CRAM it will report the sum of all samples. it should be checked in CRAM too. the index can't know about the different samples.
smoove won't work with multiple samples per bam (I don't think lumpy will either).
Yeah, that's what I'd expect. They're not actually multiple samples though...just mislabeled single samples. If by happy circumstance it was all erroneously analyzed as a single sample, then that may not be the worst thing...