How to add annotation to the input bed file that makes it though to the results file?

Open methornton opened this issue 2 months ago • 0 comments

Hello!

I am trying to learn how to use LOLA to find transcription factor binding sites for sets of differentially expressed genes. I am using the human gencode annotation and salmon aligner. I can use the UCSC genome browser to get the transcription start sites for the DE transcripts and then using bedtools slop I can add 1kb upstream to the DE bedfile. i do the same thing with high expressing, close to 0 logFC genes from the same measurement (I am able to avoid scaffolds) and I have what I think is a reasonable Universe. The bed files look like this.

chr1    192808038       192813275       ENST00000235382.7       0       +       RGS2
chr1    93447130        93555659        ENST00000260506.12      0       +       FNBP1L
chr1    150281553       150288093       ENST00000290363.6       0       +       CIART
chr1    236834615       236901418       ENST00000366576.3       0       +       MTR
chr1    155077836       155088538       ENST00000368408.4       0       +       EFNA3
chr1    153637083       153646470       ENST00000368687.1       0       +       CHTOP

Using LOLAcore and ensembl_tfbs or if I make my own region database from JASPAR. If I test one at a time I get results that look like this

> locResults
     userSet dbSet   collection pValueLog oddsRatio support rnkPV rnkOR rnkSup
       <int> <int>       <char>     <num>     <num>   <int> <int> <int>  <int>
  1:       1   239 homo_sapiens  2.087858  2.180343      15     1    22     95
  2:       1   176 homo_sapiens  2.022562  4.122688       5     2     8    166
  3:       1   145 homo_sapiens  2.010532  2.039858      17     3    27     82
  4:       1   101 homo_sapiens  1.761406  2.222335      11     4    21    119
  5:       1   293 homo_sapiens  1.754148  5.789684       3     5     4    184
 ---                                                                          
307:       1   292 homo_sapiens  0.000000  0.000000       0   253   253    253
308:       1   294 homo_sapiens  0.000000  0.000000       0   253   253    253
309:       1   295 homo_sapiens  0.000000  0.000000       0   253   253    253
310:       1   302 homo_sapiens  0.000000  0.000000       0   253   253    253
311:       1   309 homo_sapiens  0.000000  0.000000       0   253   253    253
     maxRnk meanRnk     b     c     d  description cellType tissue antibody
      <int>   <num> <int> <int> <int>       <char>   <char> <char>   <char>
  1:     95    39.3  1093    75 11917 homo_sapiens     <NA>   <NA>     <NA>
  2:    166    58.7   183    85 12827 homo_sapiens     <NA>   <NA>     <NA>
  3:     82    37.3  1333    73 11677 homo_sapiens     <NA>   <NA>     <NA>
  4:    119    48.0   767    79 12243 homo_sapiens     <NA>   <NA>     <NA>
  5:    184    64.3    77    87 12933 homo_sapiens     <NA>   <NA>     <NA>
 ---                                                                       
307:    253   253.0     5    90 13005 homo_sapiens     <NA>   <NA>     <NA>
308:    253   253.0    41    90 12969 homo_sapiens     <NA>   <NA>     <NA>
309:    253   253.0     4    90 13006 homo_sapiens     <NA>   <NA>     <NA>
310:    253   253.0    38    90 12972 homo_sapiens     <NA>   <NA>     <NA>
311:    253   253.0   113    90 12897 homo_sapiens     <NA>   <NA>     <NA>
     treatment dataSource     filename  size
        <char>     <char>       <char> <num>
  1:      <NA>       <NA> UN0327.2.bed  5322
  2:      <NA>       <NA> MA2121.1.bed   687
  3:      <NA>       <NA> MA1656.2.bed  7660
  4:      <NA>       <NA> MA1122.2.bed  2007
  5:      <NA>       <NA> UN0663.2.bed   224
 ---                                        
307:      <NA>       <NA> UN0662.2.bed    64
308:      <NA>       <NA> UN0664.2.bed   141
309:      <NA>       <NA> UN0665.2.bed    47
310:      <NA>       <NA> UN0805.1.bed  1210
311:      <NA>       <NA> UN0814.1.bed   791

I can stash identifiers for the regionDB, but how does one stash identifiers for the userSet?  Is there another result table? How do people normally correlate the matrix to the input sequence?

I really appreciate any feedback or advice. TIA

Matt

Nov 04 '25 01:11 methornton