cicero-release icon indicating copy to clipboard operation
cicero-release copied to clipboard

Handle the case when a peak overlaps with the promoter of two or more genes

Open yushengak47 opened this issue 5 years ago • 1 comments

Hi,

I found that, when a peak overlaps with the promoter of two or more genes, the default settings of annotate_cds_by_site only record one of them in the 'gene' column of fData(input_cds). As a result, some genes are missing in the gene activity matrix. I have tried to set all = T when running annotate_cds_by_site, this indeed list multiple gene names in the 'gene' column. However, it seems that build_gene_activity_matrix doesn't handle it properly. The generated matrix might be redundant and problematic, for example, it has rows named "HES2,HES2,HES2,HES2", "ESPN,ESPN,HES2", et. al.

Any idea for solving the problem?

Thanks

yushengak47 avatar Jan 21 '21 08:01 yushengak47

Hmm, this is a case that would require some modifications to fix. However I will say that the gene activity score values for two genes with the same promoter peak will be identical, so if you have a list of the sets of genes that share a promoter, you would be able to add in the appropriate rows.

I will leave this open and hopefully find time to find a solution in the future.

hpliner avatar Feb 08 '21 20:02 hpliner