tsinfer icon indicating copy to clipboard operation
tsinfer copied to clipboard

Allow quality metrics in the VCF file to affect the per-sample mismatch function

Open hyanwong opened this issue 4 years ago • 1 comments

@benjeffery had the great idea that we could use quality scores in the VCF (or FASTQ, or BAM) file to change the mismatch probabilities during match_samples. We could even use them when generating ancestors too.

For match_samples, this will require some re-plumbing to allow per-sample mismatch functions (RateMaps, or whatever we are calling them).

hyanwong avatar Dec 02 '20 21:12 hyanwong

This maps pretty naturally on to the per-site array paradigm, so we can probably build the appropriate array given the input data (genotypes and base qualities), and use that as input.

We want to use sgkit to allow us access the quality scores, so I'm pushing this to 0.3

jeromekelleher avatar Dec 02 '20 22:12 jeromekelleher