pandora
pandora copied to clipboard
Should `min_kmer_covg` be per sample in `pandora compare`?
min_kmer_covg is almost always set to the expected depth coverage / 10 of the first sample: https://github.com/rmcolq/pandora/blob/c16c1fac48ccdf98e0719ea003492c3fd2ec9034/src/compare_main.cpp#L415-L416 (unless the expected depth coverage of the first sample is < 10, in which min_kmer_covg will then become 0, and then we will try for the second sample and so on, but let's ignore this).
Eventually, this min_kmer_covg is used to compute the gaps of coverage of a sample in a VCF record: https://github.com/rmcolq/pandora/blob/c16c1fac48ccdf98e0719ea003492c3fd2ec9034/src/localPRG.cpp#L1648-L1649 , which impacts on the genotyping. Thus, the coverage of the first sample is impacting on the genotyping of all samples.
In our current experiments, this is not an issue as all samples are subsampled to the the exact same coverage, but this could be an issue when having several samples with different coverages (although none of them exceeds max_covg).
Logging this for a later fix.
yes
This is known, or was raised before. Or was that error rate. Will have big impact
I guess what was raised before was the error rate. No impact in our current experiments I guess, as all samples input to pandora are downsampled to the exact same coverage