MAESTRO icon indicating copy to clipboard operation
MAESTRO copied to clipboard

gene_peak matrix in RP_model

Open wxy-omg opened this issue 3 years ago • 3 comments

The value range of gene_peak matrix in RP_model should between 0 and 1. Why does "genes_peaks_score_csr" in scATAC_Genescore.py have a value greater than 1, with a maximum of 1600+.

wxy-omg avatar Dec 21 '20 05:12 wxy-omg

Hi, how did you get the gene activity score? Did you use the enhanced model or simple model? If you get the activity score from running the MAESTRO snakemake pipeline, the count matrix is binarized. The gene activity score is calculated by summing up the peaks (within 120kb) nearby a gene using an exponential decay model. details here https://github.com/liulab-dfci/MAESTRO/blob/master/example/Gene_activity_modelling/Gene_activity_modelling.md

if a peak is at the promoter, it will have a weight of 1, and then adding up other more distal peaks' regulatory potential, so it could be great than 1.

If you used the MAESTRO scatac-genescore command line tool, and feed it with the raw count matrix, it could have big gene activity scores because now you can have a big count in a peak. You can binarize that count matrix and rerun the same command.

I am doing a benchmark for those different ways to calculate gene activity score and will update here.

Thanks!

crazyhottommy avatar Dec 23 '20 03:12 crazyhottommy

thanks, but i mean the gene_peak without multiplication by count matrix, the element in this gene_peak matrix represent the peak regulatory the gene potential. Should max(gene_peak) is equal to 1? But i got some values greater than 1.

wxy-omg avatar Jan 23 '21 07:01 wxy-omg

how did you get the gene_peak matrix? you are right, for every peak, the max weight should be 1.

crazyhottommy avatar Feb 10 '21 14:02 crazyhottommy