octopus icon indicating copy to clipboard operation
octopus copied to clipboard

Segmentation fault when reference genome contains '.'

Open bricoletc opened this issue 2 years ago • 1 comments

Describe the bug Thanks for the great tool! If the reference fasta contains one or more '.', instead of a nucleotide, it fails with a segfault. I successfully run octopus on my fasta + bam when '.' is replaced by a valid base, e.g. 'a'.

If these shouldn't be supported, it would be useful to flag this issue when it occurs, rather than just segfaulting. Else, could also refuse to call/report any variants at positions with '.' in the ref.

Can provide the fasta and BAM I've used to identify this if you'd like.

$ octopus --version
octopus version 0.7.4
Target: x86_64 Linux 5.4.0-72-generic
SIMD extension: AVX2
Compiler: GNU 9.3.0
Boost: 1_74

Command line to run octopus:

$ octopus -I induced_ref_mapped.bam -R induced_ref.fa --organism-ploidy 1 --threads 4

bricoletc avatar Jun 20 '22 15:06 bricoletc

Thanks for the bug report. Though there is no official FASTA/Q specification, I don't believe a . would be considered a valid base by any well-established tool. However, I agree Octopus should be validating its input better - I'll look to report a more helpful error message.

dancooke avatar Jul 25 '22 10:07 dancooke