SpliceAI-lookup icon indicating copy to clipboard operation
SpliceAI-lookup copied to clipboard

Pangolin prediction score differs from local prediction

Open RChanColor opened this issue 1 year ago • 7 comments

The Pangolin predictions from the SpliceAI Lookup have consistently match my locally computed Pangolin predictions. However, I noticed today that a Pangolin prediction from the SpliceAI Lookup website did not match.

E.g. https://spliceailookup.broadinstitute.org/#variant=16-23637674-C-T&hg=37&distance=50&mask=1&ra=0 The SpliceAI Lookup web tool predicted a splice site loss at 44bp with a ∆score = -0.50, and the locally predicted ∆score for the same splice site loss is -0.28.

RChanColor avatar Sep 19 '24 04:09 RChanColor

Are you running pangolin using gene annotations that are equivalent to this Gencode v46 transcript definition?

(PALB2 ( ENSG00000083093.12_17 / ENST00000697383.1_3 / NM_001407314.1)

Also, which pangolin version are you using?

bw2 avatar Sep 19 '24 04:09 bw2

I ran Pangolin on an older laptop that has since been replaced, but the Gencode version was certainly older than v46. I also used Pangolin version 1.0.1. The transcript I use for PALB2 is ENST00000261584 (GRCh37) that's equivalent to the MANE Select ENST00000261584.9 (GRCh38). The relative location of the predicted splice site loss is the same at 44bp. The only difference is the ∆scores.

RChanColor avatar Sep 19 '24 04:09 RChanColor

Hi @RChanColor thank for reporting this issue and sorry about the slow response. I haven't been able to track down why the server generated a pangolin ∆score = -0.50 for that variant on hg19. When I ran Pangolin locally, I got -0.28 like you did. Also, the server returned the correct/expected Pangolin score of -0.28 for the same variant after liftover to hg38.

When the server runs Pangolin or SpliceAI, it caches the results, so future queries with the same exact parameters just reuse the cached results instead of re-running the model. I manually cleared the cache for this variant, so the server then reran Pangolin. Without any changes, it generated the expected score (-0.28). I then tried different parameter settings but haven't been able to get the server to generate the -0.50 value again. This is strange because nothing changed on the server in the past month. I've added more logging so that I can better understand the issue if it comes up again. Please let me know if you see something similar with another variant.

bw2 avatar Oct 07 '24 21:10 bw2

To further troubleshoot this, I recomputed scores for cached variants and compared cached scores vs recomputed scores. It turns out this issue of scores changing even though the transcripts/model/parameters didn't change is only affecting Pangolin on hg38, and not SpliceAI scores or Pangolin scores on hg37(!) Also, the issue has been surprisingly frequent for hg38 Pangolin scores.. with at least some of the scores changing at least slightly for 1 out of 5 variants in the set of ~3,000 cached variants checked so far. I haven't figured out how to reproduce the issue, but am continuing to look into possible causes.

bw2 avatar Oct 10 '24 01:10 bw2

More stats - this issue of scores changing appears to affect 28% of variants with Pangolin on hg38 5% of variants with Pangolin on hg37 and does not occur for SpliceAI scores.

bw2 avatar Oct 10 '24 15:10 bw2

Until this is fixed, if you decide you want to double-check some score, you can't just rerun the same variant because the server will just return cached results. However, you can force it to recompute the scores by adjusting the Max distance slightly to something nobody's probably used before (ie. 501bp).

bw2 avatar Oct 10 '24 15:10 bw2

Thank you for investigating this and providing the stats and tips. They're very helpful.

RChanColor avatar Oct 10 '24 16:10 RChanColor

I'm not yet 100% sure, but updates over the past week have most likely fixed this issue with Pangolin scores.

bw2 avatar Nov 08 '24 15:11 bw2

I've reset the cache this morning, so any previously-computed scores will be recomputed from scratch. Next week I will double-check that the pangolin scores are now stable.

bw2 avatar Nov 12 '24 17:11 bw2

I've confirmed that Pangolin scores are now stable and reproducible as of 1 month ago. Later this week, I will be add a notification about this to the release updates - as its currently shown on the "dev" version of the site: https://broadinstitute.github.io/SpliceAI-lookup-dev/

bw2 avatar Dec 04 '24 12:12 bw2