apps-scripts icon indicating copy to clipboard operation
apps-scripts copied to clipboard

Cant generate waterfall plot

Open alanmejiamaza opened this issue 3 years ago • 1 comments

Hi, Thank you for bringing up this tool, it's really nice. I am studying a transposon containing hexameric repeats inserted in a gene located in the Chr X. This insertion is exclusively presented in disease patients and not in the reference genome as far as I know. So, I mapped the CCS reads (coming from PacBio seq) to the Gr37 and Gr38. I can see the alignments very nicely where they are supposed to be but the "transposon is only showed in soft-clipped mode.

Then, I went through your pipeline and it seems to work fine except for the waterfall plot. The output I obtained is [ - No records in]

I am not sure what this means. No extracted regions? I made sure the sense of the repeats, so we can rule it out.

I tested on python 3.8 and 3.7 with pbcore 2.1.2, both cases on Conda. Last year, I used this pipeline with my own set of C9orf72 repeats samples and it worked fine.

Could you please give me any insight? Best regards,

alanmejiamaza avatar Jun 05 '21 00:06 alanmejiamaza

Hi,

Sorry for the slow response! My first suspicion when you say soft-clipped reads is that you might have some palindromic reads in your data. This can happen when you have a barcode on just one side of the smrtbell template, but it still circularizes and sequences because the other end can fold back on itself. The result after ccs is a read with a "ghost" adapter in-between two stretches of sequence identical and in the opposite orientation.
If that is the case, then there are potentially two copies of your target region in any one ccs read. The extraction script will throw out these double-aligned reads as artifacts.

if you do a dotplot (self x self) of one of the full-length ccs reads (before extraction) that you expect to have the expansion sequence -- does it look palindromic?

jrharting avatar Jun 24 '21 16:06 jrharting