ariba icon indicating copy to clipboard operation
ariba copied to clipboard

Bug when dealing with insertions and deletions in assemblies

Open pirale opened this issue 2 years ago • 0 comments

Hi there,

Hope you're well.

I noticed there is a bug in ariba that occurs when insertions or deletions are multiples of three and do not start at the first position of a codon. The bug seems to originate from the way insertions and deletions are handled in the source file assembly_variants.py. When an insertion or deletion is a multiple of three, but does not start at the first position of a codon, it will affect that codon in the reference and also the following codon in the reference.

I noticed the bug with a single amino-acid deletion that was incorrectly marked as a truncation by ariba. You can see for yourself by downloading the read set SRR850776 from NCBI SRA (corresponding assembly NZ_KI973283.1) and searching for the gene in the inserted fasta. >Enterobacter-NL68__wzy atgaatgataagagtttaaaaaataaccacttcaaaataagtgcgcatttagcgtttata tatttcctgcttacttcttctttattattgatttttttaacggaatcggcaagtgctaca ctttatggaactgtagaggatatttttgcggttttttgtgccattatattgtttggtgag atgatttacttctatatgcatagagtgaagtttatctcgttgcaattaatgtttgctttt gtattttctttaattataggtattccttctttttatttgtatttctttaaaaaagcttct gatggctttgaattgacttgtatatggggtatgttaataaatatcatactctatcttaca gctatcaaaaatgttcataggcaacaagcaaaaagtataaataatctatttaagattata ttttccattgttggtgtttgtcagttaattaaaattgttttttatctgaaatttatttta tcatcaggcttagggcatttagctatttatactgatagtgaagaattactttcaagtatt ccttttgctgtccgtgctattagtggcttttcttctataatggctttggcagtcttttat tataaatcatcgaaaaaatataagatgctagcatttattttgctcgcatctgaccttgtt attgggataagaaataaattcttttttgcttttatatgcattattattctctcgttatat tcaaatagaaagaaaataatagcaatattcgctagaatatccaaagtacactatttatta attggcttcgtcggtttttcaatgatttcatatcttcgtgaaggatatgaaatcaatttt attaattatcttggcgttgtacttgactctctgtcgtctacgcttgcaggtttacaagat ttatactatttgcccgatgaaaatggttgggcgttactaaacccccttacgatattatcg caagtgttgccgctcagtggttttggcttcataagcgatgcacaaattgctcatgaatat tcaacaattgtgcttggcagcgtgtctaatgggatagcgttgtcatcttctggtcttctt gaagcaagtataataagtttgcatttcaatttatttatttatcttgcctatctgttaatt atgatctcgataattcaaaaaggtttgaatagtaattatgttatttttaacttttttgcc ctggctatgatgactggtttcttctattctgttcgtggagaattaattttgccatttgct tatgttttaaaatcgtttccaataataataattgcaaatctattgactcaacaaaaaagt agaaattga The mutation TAA1289. starts at the 3rd position of a codon and results in the deletion of a single amino acid (not affecting the amino-acid sequence further in this specific case). Ariba erroneously translates TAA to a stop codon instead.

Thank you for providing the software. Happy to answer any questions that you may have.

pirale avatar May 25 '22 00:05 pirale