Mandalorion-old icon indicating copy to clipboard operation
Mandalorion-old copied to clipboard

Interpreting output BED files

Open malloryfreeberg opened this issue 7 years ago • 2 comments

Hi there. I've looked through the Mandalorion documentation here, but I still have a few clarifications regarding interpretation of some of the BED output files.

The following snippet is from the TESS.bed output file. Does Sl correspond to a TSS on the plus strand (l = left?), El = TES on the minus strand, Sr = TSS on minus strand, and Er = TES on plus strand? Is a similar scheme used for the SS.bed file (e.g. 5l = 5' splice site on plus strand, etc.)? What does the _A indicate at the end of the item in column 4? The range of these items is 20nt, so is the actual TSS/TES the midpoint of this range (e.g. in the first line, the TSS is at position 69660)?

chrII   69650   69670   Sl23_69650_69670_A      23
chrV    2164816 2164836 El35593_2164816_2164836_A       35593
chrV    9137708 9137728 Sr33582_9137708_9137728_A       33582
chrV    8349177 8349197 Er37846_8349177_8349197_A       37846

Also, is there any difference between SS.bed and SS_raw.bed? These files are identical for me.

TIA for any insights.

malloryfreeberg avatar Jul 20 '17 20:07 malloryfreeberg

Hi Mallory,

Apologies for the late reply. I finally managed to take a vacation. The naming of features follow the scheme you already more or less deciphered.

Sl -> transcription Start site identified using the Left end of reads. This would indicate a plus strand transcript

5r -> 5' splice site called using the right end of a alignment gap. This would indicate a minus strand transcript

Finally, there is a difference between the SS_raw and SS.bed if you used a genome annotation. SS.bed contains all Splice sites - annotated and inferred.

Best, Chris

christopher-vollmers avatar Sep 11 '17 22:09 christopher-vollmers

Thanks Chris.

I hope you enjoyed your vacation!

malloryfreeberg avatar Sep 19 '17 15:09 malloryfreeberg