circtools
circtools copied to clipboard
Investigate what 'unknown breakpoint' events are
Those events can be found in the data files and are plotted. However, what does it mean? The BSJ is neither covered by one nor by two mates, so how exactly is it covered?
After looking into the source code of FUCHS I am able to reconstruct what has to happen in order to produce an undefined
event.
First of all, I assessed the ratio of undefined
events throughout all samples: 2.2%
Secondly, if len(mates[mate][strand]['start']) == 1 and len(mates[mate][strand]['end']) == 1:
has to fail for the forward and reverse strand for a specific mate. Here I assume that mate is actually a readname and not a mate "pair". I.e. the length of start / stop positions has to be of unequal size for forward and reverse strand. The non-matching entries are saved into another variable, fragments
, but not used anymore:
mates, fragments = self.get_reads_from_bamfile('%s/%s' % (self.bamfolder, f), circle_coordinates)
The whole mis-classification happens only if a read does not start nor end exactly on the circle coordinate specified via the circle BAM file name.
Right now it's hard to tell if we can fix this (maybe its just a problem with the mapping?). Anyway, the impact does not seem to be to significant.