hap.py icon indicating copy to clipboard operation
hap.py copied to clipboard

How is the false positive rate calculated in som.py stats?

Open Krannich479 opened this issue 10 months ago • 0 comments

Dear hap.py developer team, I have a question regarding the output of som.py.

  • Question: I ran som.py (v0.3.15) using a short variants callset and a ground truth set. The tool ran successfully and the results seem reasonable. However, at the end of each line within the <prefix>.sompy.stats.csv file I noticed a field fp.rate which made me wonder how exactly this is computed here?

  • Background: The false positive rate (FPR) is commonly defined as FP/(FP+TN). Hence, I presume TN is computed at some point. There exists a README page dedicated to som.py but the number of True Negatives (TN) is not defined there. The bioRxiv preprint of hap.py+som.py even has a paragraph on this stating that TN are not included due to a lack of a clear definition (with which I strongly agree!):

Note that we have chosen not to include true negatives (or consequently specificity) in our standardized definitions. This is due to the challenge in defining the number of true negatives, particularly around complex variants. In addition, precision is often a more useful metric than specificity due to the very large proportion of true negative positions in the genome.

  • Example: Here is an example of my output. There is a non-zero FPR at the end of the line.
idx  type     total.truth  total.query  tp   fp  fn  unk  ambi  recall              recall_lower        recall_upper        recall2             precision           precision_lower     precision_upper     na   ambiguous  fp.region.size  fp.rate             sompyversion  sompycmd
0    indels   180          153          151  2   29  0    0     0.8388888888888889  0.7799816161378756  0.8870047190333543  0.8388888888888889  0.9869281045751634  0.9587317223755603  0.997273934669216   0.0  0.0        29903           66.88292144600877   som.py-       /<path>/bin/som.py --no-fixchr-truth --no-fixchr-query --normalize-all -r <path>/<reference>.fasta -o <prefix>.sompy <truthset>.vcf <callset>.vcf.gz

Krannich479 avatar Apr 25 '24 13:04 Krannich479