ariba
ariba copied to clipboard
Bug when dealing with insertions and deletions in assemblies
Hi there,
Hope you're well.
I noticed there is a bug in ariba that occurs when insertions or deletions are multiples of three and do not start at the first position of a codon. The bug seems to originate from the way insertions and deletions are handled in the source file assembly_variants.py. When an insertion or deletion is a multiple of three, but does not start at the first position of a codon, it will affect that codon in the reference and also the following codon in the reference.
I noticed the bug with a single amino-acid deletion that was incorrectly marked as a truncation by ariba. You can see for yourself by downloading the read set SRR850776 from NCBI SRA (corresponding assembly NZ_KI973283.1) and searching for the gene in the inserted fasta.
>Enterobacter-NL68__wzy atgaatgataagagtttaaaaaataaccacttcaaaataagtgcgcatttagcgtttata tatttcctgcttacttcttctttattattgatttttttaacggaatcggcaagtgctaca ctttatggaactgtagaggatatttttgcggttttttgtgccattatattgtttggtgag atgatttacttctatatgcatagagtgaagtttatctcgttgcaattaatgtttgctttt gtattttctttaattataggtattccttctttttatttgtatttctttaaaaaagcttct gatggctttgaattgacttgtatatggggtatgttaataaatatcatactctatcttaca gctatcaaaaatgttcataggcaacaagcaaaaagtataaataatctatttaagattata ttttccattgttggtgtttgtcagttaattaaaattgttttttatctgaaatttatttta tcatcaggcttagggcatttagctatttatactgatagtgaagaattactttcaagtatt ccttttgctgtccgtgctattagtggcttttcttctataatggctttggcagtcttttat tataaatcatcgaaaaaatataagatgctagcatttattttgctcgcatctgaccttgtt attgggataagaaataaattcttttttgcttttatatgcattattattctctcgttatat tcaaatagaaagaaaataatagcaatattcgctagaatatccaaagtacactatttatta attggcttcgtcggtttttcaatgatttcatatcttcgtgaaggatatgaaatcaatttt attaattatcttggcgttgtacttgactctctgtcgtctacgcttgcaggtttacaagat ttatactatttgcccgatgaaaatggttgggcgttactaaacccccttacgatattatcg caagtgttgccgctcagtggttttggcttcataagcgatgcacaaattgctcatgaatat tcaacaattgtgcttggcagcgtgtctaatgggatagcgttgtcatcttctggtcttctt gaagcaagtataataagtttgcatttcaatttatttatttatcttgcctatctgttaatt atgatctcgataattcaaaaaggtttgaatagtaattatgttatttttaacttttttgcc ctggctatgatgactggtttcttctattctgttcgtggagaattaattttgccatttgct tatgttttaaaatcgtttccaataataataattgcaaatctattgactcaacaaaaaagt agaaattga
The mutation TAA1289. starts at the 3rd position of a codon and results in the deletion of a single amino acid (not affecting the amino-acid sequence further in this specific case). Ariba erroneously translates TAA to a stop codon instead.
Thank you for providing the software. Happy to answer any questions that you may have.