Open chromatin predict in badly aligned regions
Hello and thanks again for the great tool.
We've run open chromatin predict and obtained bedGraphs, which we've converted to bigWigs and visualized in IGV.
These look great in areas of high coverage - they seem to match the modifications and have great looking peaks over promoters.
However, we're noticing something strange in areas where there are misalignments or big skip regions in the alignments. It's almost as if modkit was hallucinating a signal without evidence. Below is a screenshot with the problematic area blown up (you can see the signal over the promoter and to the left of it, a signal over an area without much evidence).
Hello @nchernia,
I haven't seen this before, but I may not have been looking for this error mode exactly. What is probably happening is that the model hasn't been trained on many examples with such high levels of INDELs. I'm getting some plans together for another rev on these models so I'll be sure to add some examples like this to the training.
What you could do is threshold the bedGraph with --threshold 0.9 or do this post hoc. I've found that a value of 0.9 is a good balance between precision and recall. It may also be reasonable to have a filter in the Modkit routine that will not produce a prediction if the alignment accuracy in the window is below a set value.
Thanks for the ping, I also still owe you an explanation on your other issue!
Thanks! It cleans up some signal for sure (the example I posted goes away) but doesn't solve the problem entirely. Here is another region. Note that these are all MAPQ0 alignments, there aren't many of them, and they don't appear to show 6mA. All I've done is post-processing on the first (original) track, only keeping values over 0.9.
Hello @nchernia,
You could also increase --min-coverage, it's a bit of wack-a-mole since you've already shown that you could have regions of median-depth coverage but high error reads. I think a better solution is (1) have a min-accuracy filter and (2) train the model on some examples like this.