BRAKER icon indicating copy to clipboard operation
BRAKER copied to clipboard

addUTR splice junction evidence stringency

Open cooketho opened this issue 4 years ago • 1 comments

I'm a new user of BRAKER, and overall I'm very impressed with the results, so great work!

I have a brief suggestion for a future update: I think the --addUTR feature could benefit from increased stringency in terms of what GUSHR considers to be splice evidence for a UTR call. In my data I'm seing some cases like the one in the attached picture where GUSHR is extending the 5' UTR far to the left even though it is pretty clear from the splice junction track that this is only supported by one or two reads (out of thousands). The result is a spuriously-long 5'UTR call. Is there some way to increase the threshold of evidence required to make such a call?

For the time being, I can probably just do some post-processing to handle these errors, but it would be convenient if GUSHR could implement this. gushr_igv

cooketho avatar Dec 06 '20 21:12 cooketho

As a follow-up, I was wondering about the following point in the documentation:

"For running BRAKER without UTR parameters, it is not very important whether RNA-Seq data was generated by a stranded protocol (because spliced alignments are ’artificially stranded’ by checking the splice site pattern)."

At the same locus above, a little ways away, I see the following problem (see attached picture). The gene on the left is transcribed in a leftwards direction, but the first exon (as called by BRAKER) is not supported by the read data, which clearly has the wrong strandedness to be part of that gene (as denoted by the reads being colored red instead of blue). The correct first exon is what BRAKER has called as the second exon. Is this something that could be improved upon by taking the strandedness into account, rather than using "artificially stranded" data? braker_igv

cooketho avatar Dec 08 '20 16:12 cooketho