nanocompore icon indicating copy to clipboard operation
nanocompore copied to clipboard

Use a peak calling algorithm to call significant positions from the pValue graph

Open a-slide opened this issue 5 years ago • 6 comments

At the moment Nanocompore reports all the positions bellow the pvalue threshold, but when a position is significant the adjacent kmers are often significant as well. Using a peak calling algorithm would allow to refine the position of the modified residue

a-slide avatar May 14 '19 18:05 a-slide

This is something to consider for a future update. I've added it to the Version 2 milestone.

tleonardi avatar Jan 29 '20 11:01 tleonardi

I have got a prototype that seems to be working alright. See example after: NC_peakcalling

a-slide avatar Apr 02 '20 14:04 a-slide

Ideally, it could be done directly in TxComp, but would require TxComp to be able to correct pValues without knowing all the pvalues. Random permutations and empirical pValues?

a-slide avatar Apr 29 '20 08:04 a-slide

Hey guys. In the MetaCompore pipeline, there is a nanocompore postprocessing script that calls peaks using scipy's find_peaks.

I'm just trying to wrap my head around what the presence/absence of a peak means?

For example,

ref_id  pos     pvalue  peak
ATCC11775_23S_rRNA      1132    0.0085561746316739      False
ATCC11775_23S_rRNA      1144    0.0023775255077 False
ATCC11775_23S_rRNA      1612    0.0086023977807072      False
ATCC11775_23S_rRNA      1613    2.803727380359832e-06   True
ATCC11775_23S_rRNA      1614    0.0005099579444844      False
ATCC11775_23S_rRNA      1615    0.000140505982829       False
ATCC11775_23S_rRNA      1616    0.0029064514858123      False
ATCC11775_23S_rRNA      1625    0.0049815292659649      False
ATCC11775_23S_rRNA      1626    0.0003991030640439      False
ATCC11775_23S_rRNA      1627    0.000140505982829       True
ATCC11775_23S_rRNA      1628    0.0052356375331268      False

is the idea that I should really ignore the False peaks because they are surrounded by other significant sites? So in the above example, position 1613 should be used and 1614-16 be ignored? If so, what about 1132 where it is significant, has a False peak, but has no surrounding positions that are significant?

mbhall88 avatar Feb 25 '22 06:02 mbhall88

Hi, yes that's the fundamental ideal. When you have neighboring significant kmers, the peak is the most significant one. It's just a handy way of collapsing the effect of a modification (which can impact multiple neighboring kmers) to a single kmer. Just keep in mind that this doesn't mean that the modification is the central base of the peak kmer. Not sure why 1132 is not marked as a peak.. can you share the input file?

tleonardi avatar Mar 15 '22 10:03 tleonardi