stringdist
stringdist copied to clipboard
Question about stringdist()
Hi Mark,
I hope you're doing well. My name is Ruohan, and I'm a second-year PhD student at UCL. I'm currently using the stringdist package to measure linguistic distance, and I’ve found it incredibly useful. However, I’ve encountered a few issues that I’m hoping you can help clarify.
I’ve been working through the R manual for stringdist (https://cran.r-project.org/web/packages/stringdist/stringdist.pdf), which discusses how different edits (deletion, insertion, substitution, transposition) can be weighted (on page 20). For example, in the case stringdist('ab', 'ba', weight=c(1,1,1,0.5)), the output is "0.5," suggesting that a transposition was performed.
Building on this example, I tried the following cases:
- 1. stringdist('ab', 'a', weight=c(0.5, 1, 1, 1))
- I expected an output of "0.5" due to a weighted deletion, but the output was "1."
- 2. stringdist('ab', 'a', weight=c(1, 0.5, 1, 1))
- This returned "0.5," which seems to indicate an insertion rather than a deletion.
- 3. stringdist('a', 'ab', weight=c(0.5, 1, 1, 1))
- Here, I received the "0.5" output, indicating a weighted deletion.
Given these results, I’m wondering if I might have misunderstood the string distance calculation. Specifically, I assumed that stringdist('ab', 'a') would attempt to match 'ab' to 'a' by deleting a character, while stringdist('a', 'ab') would result in an insertion. Could you clarify how the algorithm determines whether to apply an insertion or deletion in these cases?
Additionally, when I tried stringdist('abc', 'ca', method = "dl", weight = c(1, 0.1, 0.01, 0.001)), I received an output of "0.002," which suggests that two transpositions were performed to match "abc" to "ca." Shouldn’t this also involve a deletion or insertion?
I look forward to your insights. Thank you very much for your time.
Best wishes, Ruohan
Thank you for reporting this. Unfortunately I am currently not in the position to look into this.
Hi Mark,
Thank you for letting me know. Have a nice day!
Best wishes, Ruohan