bambu icon indicating copy to clipboard operation
bambu copied to clipboard

Merging read classes across samples results in fewer novel transcripts

Open apsteinberg opened this issue 5 months ago • 14 comments

Hi Andre and the Bambu team,

I am encountering an issue related to issue #444 . I merged read classes across 44 samples as you had suggested and found that there were about 1000 novel transcript isoforms discovered. Originally, I had analyzed 9 of these samples using the same read classes, and I found there were 1300 novel transcript isoforms. In both cases, I fixed NDR = 0.1.

Why am I finding fewer novel transcript isoforms with the 44 samples vs the 9 samples? From my reading of your user manual on github. I thought that in multisample mode the samples were still analyzed independently, but then the merge simply unified novel transcript IDs across samples. Is it possible that somehow some of the putative novel transcripts ended up merged with canonical transcripts that were detected in the larger analysis? Further, how is this merge performed and how does it differ from stringtie's merge method? I know you mentioned it would increase my false positive rate to use the latter, but it is unclear to me how.

Thanks for your time and help.

Best, Asher

apsteinberg avatar Sep 19 '24 15:09 apsteinberg