LOLA
LOLA copied to clipboard
How to add annotation to the input bed file that makes it though to the results file?
Hello!
I am trying to learn how to use LOLA to find transcription factor binding sites for sets of differentially expressed genes. I am using the human gencode annotation and salmon aligner. I can use the UCSC genome browser to get the transcription start sites for the DE transcripts and then using bedtools slop I can add 1kb upstream to the DE bedfile. i do the same thing with high expressing, close to 0 logFC genes from the same measurement (I am able to avoid scaffolds) and I have what I think is a reasonable Universe. The bed files look like this.
chr1 192808038 192813275 ENST00000235382.7 0 + RGS2
chr1 93447130 93555659 ENST00000260506.12 0 + FNBP1L
chr1 150281553 150288093 ENST00000290363.6 0 + CIART
chr1 236834615 236901418 ENST00000366576.3 0 + MTR
chr1 155077836 155088538 ENST00000368408.4 0 + EFNA3
chr1 153637083 153646470 ENST00000368687.1 0 + CHTOP
Using LOLAcore and ensembl_tfbs or if I make my own region database from JASPAR. If I test one at a time I get results that look like this
> locResults
userSet dbSet collection pValueLog oddsRatio support rnkPV rnkOR rnkSup
<int> <int> <char> <num> <num> <int> <int> <int> <int>
1: 1 239 homo_sapiens 2.087858 2.180343 15 1 22 95
2: 1 176 homo_sapiens 2.022562 4.122688 5 2 8 166
3: 1 145 homo_sapiens 2.010532 2.039858 17 3 27 82
4: 1 101 homo_sapiens 1.761406 2.222335 11 4 21 119
5: 1 293 homo_sapiens 1.754148 5.789684 3 5 4 184
---
307: 1 292 homo_sapiens 0.000000 0.000000 0 253 253 253
308: 1 294 homo_sapiens 0.000000 0.000000 0 253 253 253
309: 1 295 homo_sapiens 0.000000 0.000000 0 253 253 253
310: 1 302 homo_sapiens 0.000000 0.000000 0 253 253 253
311: 1 309 homo_sapiens 0.000000 0.000000 0 253 253 253
maxRnk meanRnk b c d description cellType tissue antibody
<int> <num> <int> <int> <int> <char> <char> <char> <char>
1: 95 39.3 1093 75 11917 homo_sapiens <NA> <NA> <NA>
2: 166 58.7 183 85 12827 homo_sapiens <NA> <NA> <NA>
3: 82 37.3 1333 73 11677 homo_sapiens <NA> <NA> <NA>
4: 119 48.0 767 79 12243 homo_sapiens <NA> <NA> <NA>
5: 184 64.3 77 87 12933 homo_sapiens <NA> <NA> <NA>
---
307: 253 253.0 5 90 13005 homo_sapiens <NA> <NA> <NA>
308: 253 253.0 41 90 12969 homo_sapiens <NA> <NA> <NA>
309: 253 253.0 4 90 13006 homo_sapiens <NA> <NA> <NA>
310: 253 253.0 38 90 12972 homo_sapiens <NA> <NA> <NA>
311: 253 253.0 113 90 12897 homo_sapiens <NA> <NA> <NA>
treatment dataSource filename size
<char> <char> <char> <num>
1: <NA> <NA> UN0327.2.bed 5322
2: <NA> <NA> MA2121.1.bed 687
3: <NA> <NA> MA1656.2.bed 7660
4: <NA> <NA> MA1122.2.bed 2007
5: <NA> <NA> UN0663.2.bed 224
---
307: <NA> <NA> UN0662.2.bed 64
308: <NA> <NA> UN0664.2.bed 141
309: <NA> <NA> UN0665.2.bed 47
310: <NA> <NA> UN0805.1.bed 1210
311: <NA> <NA> UN0814.1.bed 791
I can stash identifiers for the regionDB, but how does one stash identifiers for the userSet? Is there another result table? How do people normally correlate the matrix to the input sequence?
I really appreciate any feedback or advice. TIA
Matt