bonito icon indicating copy to clipboard operation
bonito copied to clipboard

Pair consensus decoding for sequences with the same UMI

Open cnk113 opened this issue 3 years ago • 3 comments

Hello,

I'll be soon getting data RNA-seq data with UMIs, and I was interested in running pair consensus decoding on the data. I noticed the paper's and bonito's implementation takes the reverse complement since it's assumed it was from the same molecule. In my case it's just PCR duplicates so would it be possible to add an option to not take the reverse complement? I'm assuming I can comment out line #149 in pair.py or is there more to it? Also since I'm detecting multiple copies with > 2 UMI's, could the current model take the consensus of multiple reads during decoding? I'm assuming the runtime will be quadratic in terms of number of reads but the read lengths are shorter since these are transcripts. Otherwise I was planning to running pair decoding randomly in the pool of UMIs and then take the POA of the decoded pairs.

Thanks, Chang

cnk113 avatar Oct 06 '20 22:10 cnk113

Hey @cnk113

Yes, we take the reverse complement by default but as you have found it's an easy modification if you want to try the approach with multiple copies of the same strand. Only pairs are supported right now so I think your POA strategy is a good one.

HTH,

Chris.

iiSeymour avatar Oct 13 '20 22:10 iiSeymour

Hi

We are also very interested in this and have multiple reads with the same UMI where some are forward and some are reverse. Guess the strand info could be integrated into the pairs.csv. Would be more convenient than having two versions of pair.py. At the NCM they were talking about Q30 for three reads. Do you have an idea about when a Bonito update will be available that accepts more than two reads for an UMI ?

Thanks,

Rainer

RainerWaldmann avatar Dec 04 '20 12:12 RainerWaldmann

The multi-dimensional calling work is at an earlier stage and needs improvements to usability and performance before it is ready for release; be assured that we're really keen to get this method released. We were excited by the early results and wanted share; expecting further accuracy gains once the work is integrated into our base callers.

Best wishes,

TimM.

tmassingham-ont avatar Dec 04 '20 13:12 tmassingham-ont