pandora Should `min_kmer_covg` be per sample in `pandora compare`?

Should `min_kmer_covg` be per sample in `pandora compare`?

Open leoisl opened this issue 6 years ago • 3 comments

trafficstars

min_kmer_covg is almost always set to the expected depth coverage / 10 of the first sample: https://github.com/rmcolq/pandora/blob/c16c1fac48ccdf98e0719ea003492c3fd2ec9034/src/compare_main.cpp#L415-L416 (unless the expected depth coverage of the first sample is < 10, in which min_kmer_covg will then become 0, and then we will try for the second sample and so on, but let's ignore this).

Eventually, this min_kmer_covg is used to compute the gaps of coverage of a sample in a VCF record: https://github.com/rmcolq/pandora/blob/c16c1fac48ccdf98e0719ea003492c3fd2ec9034/src/localPRG.cpp#L1648-L1649 , which impacts on the genotyping. Thus, the coverage of the first sample is impacting on the genotyping of all samples.

In our current experiments, this is not an issue as all samples are subsampled to the the exact same coverage, but this could be an issue when having several samples with different coverages (although none of them exceeds max_covg).

Logging this for a later fix.

Oct 29 '19 16:10 leoisl

yes

Oct 29 '19 16:10 rmcolq

This is known, or was raised before. Or was that error rate. Will have big impact

Oct 29 '19 17:10 iqbal-lab

I guess what was raised before was the error rate. No impact in our current experiments I guess, as all samples input to pandora are downsampled to the exact same coverage

Oct 29 '19 17:10 leoisl

pandora pandora copied to clipboard

Should `min_kmer_covg` be per sample in `pandora compare`?

pandora
pandora copied to clipboard