reg-gen icon indicating copy to clipboard operation
reg-gen copied to clipboard

RGT-Hint footprinting and differential analysis output

Open mdurante1 opened this issue 7 years ago • 7 comments

Hi,

I have been able to succesfully run the RGT-Hint pipeline according to the tutorial on the website (https://www.regulatory-genomics.org/hint/tutorial/) to conduct footprinting and differential analysis on ATAC-Seq data. I have obtained the plots that are generated from running the "rgt-hint differential" command. Can you please explain what the value that is plotted on the y-axis is and how it is obtained? Is there a way to obtain the raw data that was used to generate these plots so that different plots can be generated? I would like to see which specific genomic regions are contributing most to the observed differences in the plot. When I observe certain regions, using *.bw files in igv, that I see in the *_mbps.bed file for a given TF I don't see many differences so I would like to be able to quantitate which regions have large differences in TF binding probability.

Also is there a way to generate statistical significance to see if the plots are depicting a significant change. Some of the plots will show differences in some areas and not others and it is difficult to interpret whether that TF has a significant difference in binding probability.

Thanks for all of your help, it is greatly appreciated.

Best, Michael

mdurante1 avatar Feb 13 '18 22:02 mdurante1

Hi @mdurante1 ,

The y-axis is the average ATAC-seq signal around the predicted transcription factor binding sites.

In our last release, we provided the option --output-profiles to write the footprint profiles into a text file, in which each row represents a specific instance of the given motif. In addition, we included a statistical test in the scatter plot to highlight the significant factors.

Best, Li

lzj1769 avatar Jun 29 '18 09:06 lzj1769

Hi @lzj1769 ,

I used rgt-hint to do footprinting analysis.

The command I used was rgt-hint differential --organism hg38 --bc --nc 20 --standardize --mpbs-files condition1.bed,condition2.bed --reads-files condition1.bam,condition2.bam --conditions condition1,condition2 --output-prefix footprinting_differential --output-location=footprinting_standardize. When I used defaulted lfc value, in the output log2foldchange plot, there are many TFs that locate really far away from the plot. When I adjusted lfc values with 2 and 20, the far away TFs were just disappeared. Do you have suggestions for me to solve this issue? The TFs far from the plot should be the ones with high significance, so that they locate so different. BTW, the dot colors in the plot are different as they are in the legend.

Thank you very much. Looking forward to your reply.

using -lfc 0.1 Many TFs are really out of space. 17331674046958_ pic

using -lfc 2 17281674046599_ pic

using -lfc 20 17301674046629_ pic

Yingzi

YingziZhang-github avatar Jan 18 '23 13:01 YingziZhang-github

@minashaigan

Any ideas about this issue?

lzj1769 avatar Jan 18 '23 14:01 lzj1769

@minashaigan

Any ideas about this issue?

Dear Zhijian,

Thank you very much for the reply. I am looking forward to your feedback! If it is more suggested by you, I can customize the plot by drawing using thergtoutput data as well. The output files I can see are (named by default) the differential_factor.txt and differential_statistics.txt. Would you please suggest if and how can I utilize the inside values to re-draw log2foldChange plot and the activity statistics plot? Are the log2(Fold Change) in the above discussion equal to or are they the log2value of "TF_Activity" in differential_statistics.txt?

rgt till now has given me many exciting results. It would really be nice if I can customize and polish the rgt output figures.

Thank you very much.

Yingzi

YingziZhang-github avatar Jan 19 '23 12:01 YingziZhang-github

Hello Yingzi,

To have a symmetric figure, I define x limits of the plot based on the round of max of abs log2(FoldChange). All your fold changes are more minor than 0.5, which will be rounded to 0. So I will replace round with ceil.

Yes, the log2(Fold Change) in equal to the substraction of the log2value of "TF_Activity" in differential_statistics.txt

Thanks for the feedback, Mina

minashaigan avatar Jan 19 '23 13:01 minashaigan

Hi @YingziZhang-github

The file differential_factor.txt contains normalization factors that rgt-hint used to normalize the ATAC-seq between conditions to account for different sequencing depths.

As @minashaigan pointed out, you can find the raw outputs in differential_statistics.txt, and use it for customizing plot.

Best, Zhijian

lzj1769 avatar Jan 19 '23 14:01 lzj1769

Hi @minashaigan and @lzj1769 ,

Thank you for the answering. DifferentialAnalysis.py greatly helps also. My customizing plot works very well.

Many thanks, Yingzi

YingziZhang-github avatar Jan 22 '23 06:01 YingziZhang-github