pandora icon indicating copy to clipboard operation
pandora copied to clipboard

Should `min_kmer_covg` be per sample in `pandora compare`?

Open leoisl opened this issue 6 years ago • 3 comments
trafficstars

min_kmer_covg is almost always set to the expected depth coverage / 10 of the first sample: https://github.com/rmcolq/pandora/blob/c16c1fac48ccdf98e0719ea003492c3fd2ec9034/src/compare_main.cpp#L415-L416 (unless the expected depth coverage of the first sample is < 10, in which min_kmer_covg will then become 0, and then we will try for the second sample and so on, but let's ignore this).

Eventually, this min_kmer_covg is used to compute the gaps of coverage of a sample in a VCF record: https://github.com/rmcolq/pandora/blob/c16c1fac48ccdf98e0719ea003492c3fd2ec9034/src/localPRG.cpp#L1648-L1649 , which impacts on the genotyping. Thus, the coverage of the first sample is impacting on the genotyping of all samples.

In our current experiments, this is not an issue as all samples are subsampled to the the exact same coverage, but this could be an issue when having several samples with different coverages (although none of them exceeds max_covg).

Logging this for a later fix.

leoisl avatar Oct 29 '19 16:10 leoisl

yes

rmcolq avatar Oct 29 '19 16:10 rmcolq

This is known, or was raised before. Or was that error rate. Will have big impact

iqbal-lab avatar Oct 29 '19 17:10 iqbal-lab

I guess what was raised before was the error rate. No impact in our current experiments I guess, as all samples input to pandora are downsampled to the exact same coverage

leoisl avatar Oct 29 '19 17:10 leoisl