Options to favor detection of several small deletions instead of single large ones
unimap does a great job aligning complex regions (e.g. satellites) even using reads. We are using it to detect some variants at telomeric regions, using PacBio HiFi sequencing data.
We have evidence from one of these regions that there should be 2 separate deletions, corresponding to two separate groups of monomers from a satellite, each of them of different size. However, unimap finds one large deletion instead, close together to several variants that suggest mismapping. I assume this behavior comes from the default settings, that may favour large deletions instead of many small deletions. The program was run with unimap -a -x asm5 -x hifi --cs. I attach snapshot below for this region.

I have been trying to tune settings to penalize nucleotide mismatchings and favour more than 1 deletion if convenient. However, I find some problems including:
-
-B 500 -O 1 -E 2: increased mismatch penalty and lowered cost for gaps. This results in a "Segmentation fault" error. -
-B 500: Does not seem to solve the problem, the alignment remains the same. -
-O 1 -E 2: This makes some improvements, and for some reads we observe that mismatches disappear in favour of deletions (snapshot below), but not for the majority of reads. Cannot make progress from here however, since using "-B > 6" will cause the segfault, lower values of "-B" won't change anything, and -O and -E are near their minimum values (cannot take values of zero).

Wonder if there are other options that are worth exploring here. Any hints on how to deal with this situation? Thank you
Minimap2/unimap won't work with a very large -B because internally every score must be within [-128,127]. I should have added a test for that. Nonetheless, the failure of -O1 -E2 -B7 is caused by something else that I have overlooked. Note that under this scoring, the optimal alignment will have no mismatches. You can turn any mismatch to two gaps for a better score:
ATCATCA ATC-ATCA
ATCtTCA ATCt-TCA
This is probably violating the SW formulation unimap/minimap2 is using. You need to make sure ({-O}+{-E})*2>{-B} at least.
In your case, you can try -A1 -B4 -O6 -E2, or -A1 -B4 -O4 -E1.
Unfortunately, these options does not solve the issue, and still the majority of reads show the large deletion + mismatches.
From what I understand, it may be possible to try something like -B100 -O1 -E50, only that this would need to be in a future release to allow B > 7? We are aware that this might return unrealistic alignments for most of the genome, but for this particular region allowing multiple deletions could be more appropriate.
Thank you for the hints and feedback,
The requirement is ({-O}+{-E})*2>{-B}; otherwise the alignment largely doesn't make sense. B<=7 is not a requirement. asm5 uses -A1 -B19 -O39,81 -E2,1. If you have the sequences, I can give a try.
Thank you for the clarifications, and for offering giving it a look. I have been trying different combinations of parameters following the above rule, but with no success.
Shall I send you the files to an email address? apparently I cannot attach them here.
Hello,
@elcortegano I would like to ask, how did you obtain such a nice view of the alignment? I guess using some software (?) and fetching it bam/sam format coming out of the unimap? Thank you for your response.
Hi @ilikvlad , this is with IGV (https://software.broadinstitute.org/software/igv/download).