affinegap
affinegap copied to clipboard
Weird Behavior Caused by abbreviation_scale
Hi,
I don't know if this is an issue/bug or is intentional, but here is an example of a weird behavior:
>>> affinegap.affineGapDistance("TED A", "TD A",
+ matchWeight = 0, spaceWeight = 2, gapWeight = 10,
+ abbreviation_scale = 0.125)
12.0 # Correct. Open a gap and insert a space for "E".
>>> affinegap.affineGapDistance("TESD A", "TD A",
+ matchWeight = 0, spaceWeight = 2, gapWeight = 10,
+ abbreviation_scale = 0.125)
14.0 # Correct. Continue the gap and insert a space for "S".
>>> affinegap.affineGapDistance("TESTD A", "TD A",
+ matchWeight = 0, spaceWeight = 2, gapWeight = 10,
+ abbreviation_scale = 0.125)
16.0 # Correct. Continue the gap and insert a space for "T".
>>> affinegap.affineGapDistance("TESTED A", "TD A",
+ matchWeight = 0, spaceWeight = 2, gapWeight = 10,
+ abbreviation_scale = 0.125)
16.25 # Weird.
# This is because the second "E" is at position 5, which is greater
# than the length of the second string.
# So the score for the additional space for "E" is scaled.
I believe it is triggered by https://github.com/dedupeio/affinegap/blob/853f3d3d02d9a9adc1ec92dd9448949f51748e87/affinegap/affinegap.pyx#L79 by accident.
It would be great if you could take a look. Thanks.
I'm not super familiar with the algorithm, sorry. Can you explain a little more? What is the expected match pattern? 1:
TESTED A
TxxxxD A
or 2:
TESTED A
yyyTzD A
and can you explain how you think it's calculating that 16.25?