TOBIAS icon indicating copy to clipboard operation
TOBIAS copied to clipboard

Footprint looks smoothed with less Tn5 cutsites

Open annrosebright opened this issue 2 years ago • 3 comments

Hi,

In the Tobias analysis, am at the BINDetect step and when I try generating plots I get plots that look very smooth. Generally, Tn5 cut sites are strong, even if the TF doesn't have strong binding, the cut sites look different than what I see below as per my knowledge. The data I use is a single cell ATACseq, clusters represent my conditions. Have you seen such kinds of plots? Can it be that I need to treat the data in a different way? I checked a couple of them and these are TFs that should show strong differential binding. In #137 I have mentioned the steps I take to process the data. Screenshot 2022-06-17 at 15 13 37 Screenshot 2022-06-17 at 15 13 44

Looking forward to your suggestion.

Thank you.

Best, Ann

annrosebright avatar Jun 17 '22 13:06 annrosebright

Hi Ann,

My suggestion would be to also try to plot the Tn5 insertion signals themselves (the _corrected.bw output of TOBIAS ATACorrect). I think your plots rightly show that the STAT3 footprint scores are higher in C5_E16 than in C7_E12, so maybe this also reflects in the actual Tn5 insertions. The inverse footprint scores for GATA3 might be more difficult to explain, but it is possibly because GATA has a strong insertion peak in the middle of the motif, thus not looking like a true "footprint".

BR Mette

msbentsen avatar Jun 20 '22 09:06 msbentsen

Hi Metter,

Thank you for your reply. I am not sure if I understand your suggestion. But if you mean Tn5 cut sites generated while running the ATACorrect, they look like the following. filtered_C5_E16_atacorrect.pdf filtered_C7_E12_atacorrect.pdf

What I don't understand is why don't I see the tn5 cut sites in and around TF binding sites. That's what we generally expect right? I would expect the conditions to not have many differences in terms of footprints but instead, expect the TF to be strong with a dip at the binding site along with an increased cut single around it. This is an example of what a highly variable factor footprint looks like: Screenshot 2022-06-20 at 23 51 42

Then as a control, I plotted CTCF and it looked like the following. CTCF is really conserved and is known to have a strong footprint even if your sample is not sequenced deep. CTCF footprint looks inverted with no cut sites. Screenshot 2022-06-20 at 23 53 47

Sorry for this long message. I feel there is something wrong. Can it be that there is internal normalization that is removing the cut sites and inverting the figure? If I share the scripts I used, would it be useful to solve the issue?

Looking forward to your reply. Best, Ann

annrosebright avatar Jun 20 '22 21:06 annrosebright

Hi Ann,

Yes exactly, you would expect the Tn5 insertion sites (meaning the ends of the reads) to be depleted around a TF binding sites. These insertion site signals are found as the output of ATACorrect and are called "<prefix>_uncorrected.bw", "<prefix>_corrected.bw", "<prefix>_expected.bw" etc. From the corrected signal, you can then use ScoreBigwig to calculate the footprint scores, which represent the strength of the insertion site footprint.

To view the footprint, you need to give the "_corrected.bw" file from ATACorrect as input to PlotAggregate - not the "_footprint.bw" file from ScoreBigwig. That should give you a very clear footprint for CTCF, since it looks like the footprint scores are very high at the center of the motif.

Hope this helps,

BR Mette

msbentsen avatar Jun 21 '22 08:06 msbentsen