tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Calculate IQS to assess imputation performance

Open szhan opened this issue 3 years ago • 2 comments
trafficstars

Imputation quality score (IQS) is another popular way to measure genotype imputation performance (discussed in #2193). This paper proposed IQS (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2837741/). IQS accounts for chance agreement, whereas overall concordance does not. When dealing with rare alleles (< 0.5% MAF), even randomly guessing the minor allele can lead to misleadingly high overall concordance. For example, let's consider an untyped site that has 90 genotypes homozygous for a major allele and 10 genotypes with a minor allele. Let's say that the major allele is strongly linked to another common allele typed in the reference panel. Also, let's say that all the 90 genotypes are correctly imputed (common alleles linked to other common alleles are easy to impute). If we impute this untyped site and assess the imputation accuracy of this site using overall concordance, then concordance is at least 0.90.

IQS for haploid genotypes can be calculated as follows (see Table 1 in PMC2837741 for the original diploid version). Note that 1 and 2 represent two different alleles.

n11 = np.sum([y == 1 for x, y in zip(genotypes_imputed, genotypes_true) if x == 1]) # Allele 1 imputed correctly
n22 = np.sum([y == 2 for x, y in zip(genotypes_imputed, genotypes_true) if x == 2]) # Allele 2 imputed correctly
n12 = np.sum([y == 2 for x, y in zip(genotypes_imputed, genotypes_true) if x == 1]) # Allele 2 imputed wrongly
n21 = np.sum([y == 1 for x, y in zip(genotypes_imputed, genotypes_true) if x == 2]) # Allele 1 imputed wrongly

n1_ = n11 + n12
n2_ = n21 + n22
n_1 = n11 + n21
n_2 = n12 + n22
n__ = n11 + n21 + n12 + n22 # Total genotypes imputed

Po = float(n11 + n22) / float(n__) # Observed overall concordance
Pc = float(n1_ * n_1 + n2_ * n_2) / float(n__ * n__) # Chance agreement
IQS = (Po - Pc) / (1 - Pc)

szhan avatar Apr 25 '22 16:04 szhan

SGTM. What happens when we have more than 2 alleles?

jeromekelleher avatar Apr 26 '22 10:04 jeromekelleher

Notes about interpretation of IQS:

  1. "The value of one indicates a perfect match" (from the IQS paper);
  2. The value of zero indicates that the observed agreement is equal to chance agreement; and
  3. "negative values indicate that the imputation program performed worse than chance."

szhan avatar Jul 24 '22 15:07 szhan