TOBIAS icon indicating copy to clipboard operation
TOBIAS copied to clipboard

Different locus numbers across conditions and Strategies for Multi-Tissue TF Footprint Score Comparison

Open Myrtle-bio opened this issue 8 months ago • 5 comments

Hello, I appreciate your continued assistance, it has been very useful to me!!

I am working on multiple conditional data, And I want to identify which TFs are important in each condition in certain bed regions. Just similar with your research.

image

I've observed significant variations in the locus numbers within BINDetect results across different conditions. For instance, in <condition1_bindetect>/<TF1>/TF1_overview.txt, there are over 20,000 rows, whereas in <condition2_bindetect>/<TF1>/TF1_overview.txt, there are over 30,000 rows.

image Based on this description ,I initially presumed that the locus numbers would be consistent across conditions.

image Based on this, Could the TFBS with no output be a reason for F[i, i+Wf] < 0? I noticed that there are TFBS_footprints_condition_score=0 in the output.

Here comes the following questions:

  1. When comparing the significance of TF1 in my region of interest across conditions, and given there are differing numbers of binding sites in each condition, For example, the BINDetect result shows 7 sites in condition1, but shows 7 same sites and 3 more sites in condition2 . should I use the maximum TF_condition score or the mean TF_condition score? I lean towards the mean strategy from a biological perspective, but I'm unsure if it's fair to divide condition1, which has only 7 binding sites, by 7 when it seems that in three other sites, condition1 may not even have binding, unlike condition2. However, if divided by 10, it seems that the footprint scores on the other three sites are not necessarily 0, as I mentioned earlier, there are TFBS_footprints_condition_score=0 in the output
  2. After obtaining the mean TF_condition score for each condition, you mentioned that image So I think maybe I don't need to perform additional normalization, right? But I've noted a clear bias in certain situations, but biologically, it seems improbable that all TFs would exhibit this pattern. image Could I be overlooking something? To clarify, I use ATAC peaks from the entire genome as input. I then employ bedtools intersect with the BINDetect results and the regions of interest to obtain the footprint scores within those specific regions. This approach differs from directly using the peaks within my regions of interest as input.

I apologize for the barrage of questions, and I hope you have a wonderful Halloween!

Myrtle-bio avatar Oct 25 '23 03:10 Myrtle-bio