straglr
straglr copied to clipboard
Genotyping amplicons?
Hi,
Thanks for a great tool. I am playing around with genotyping amplicon data from Nanopore sequencing. I can get Straglr to call certain STRs but not others, and I wonder if I need to do something differently. I have tried changing motifs, positions of the sequence, etc, but I have yet to be successful. Are there any suggestions you can give, please?
I've attached an example file aligned to grch38, and I'm running straglr (latest version) thus:
straglr.py barcode32.new.sorted.bam genome.fa batch1 --genotype_in_size --min_support 1 --loci strtest.bed --max_str_len 1000 --max_num_clusters 2 --nprocs 8
And I get:
#chrom start end repeat_unit allele1:size allele1:copy_number allele1:support allele2:size allele2:copy_number allele2:support
chr1 204156332 204156364 ACAG 31.0 7.8 8 - - -
chr11 2171086 2171116 TGAA 32.1 8.0 13 - - -
chr5 150076322 150076397 CTAT 68.4 17.1 139 - - -
chrX 134481492 134481561 TCTA 72.5 18.1 2 - - -
chrX 67545317 67545419 GCA 94.4 31.5 7 - - -
str.tar.gz Uploading str.tar.gz…
Thanks for trying Straglr. This is my first time seeing amplicon data, and the main issue is that each read can cover >1 locus. This violates my assumption of each read (the majority of it) covering 1 locus only when checking the alignment CIGAR string. A lot of noise will creep in if this screen on alignments were not made. A separate targeted amplicon mode will need to be implemented to handle this datatype.