shapeit5 icon indicating copy to clipboard operation
shapeit5 copied to clipboard

SER meaurement - merge multiallelic sites first?

Open JosephLalli opened this issue 1 year ago • 0 comments

When handling multiallelic sites, the best practice is to split multiallelic sites before phasing.

However, I'm not sure how to handle these sites when measuring switch error rate. The posted tutorial seems to leave multiallelic sites as split biallelic sites when measuring SER performance using shapeit5_switch. Is that what users should do when measuring SER in their data sets?

SER = #switches/#of hets. If a 1|2 heterozygous site is split and then erroneously phased , I'd think that is one switch error rate at one heterozygous site, not two errors (0|1 and 1|0) at two sites (see below for an illustration of what I mean). Thus poor performance at multiallelic sites (esp. sites with many alleles like STRs) would artificially inflate the SER.

chr20 1000 A T,C 1/2 split -> chr20 1000 A T 0/1 chr20 1000 A C 0/1 phase -> chr20 1000 A T 0|1 chr20 1000 A C 1|0 merge multiallelics, preserve phasing -> chr20 1000 A T,C 2|1

JosephLalli avatar Oct 24 '23 20:10 JosephLalli