editdistance Accurate benchmarks?

Hey, I ran a quick benchmark of my own on, using:

macOS 10.14.5 (Mojave)
Python 3.7.3
python-Levenshtein 0.12.0 (pypi)
editdistance 0.5.3 (pypi)

In my tests python-Levenshtein is about 10x faster. Perhaps it's the macOS binaries? Or maybe your tests are outdated?

import editdistance
from Levenshtein._levenshtein import *
from timeit import timeit

s00 = "akjsdjkahsdhjkashd"
s01 = "akjsdjkahsdhj"
s10 = 'xyzz'
s11 = 'xab'
s20 = 'aaaaaaaaaaaaaaaaaaaaaaa'
s21 = 'i'

def a1():
    editdistance.eval(s00, s01)
    editdistance.eval(s10, s11)
    editdistance.eval(s20, s21)
def a2():
    distance(s00, s01)
    distance(s10, s11)
    distance(s20, s21)

print("editdistance")
print(timeit(a1, number=100000))

print("\npython-Levenshtein")
print(timeit(a2, number=100000))

Prints:

editdistance
0.330241583

python-Levenshtein
0.03681695899999998

Jun 19 '19 09:06 ekreutz

I just compared those two in a real-life application and editdistance is about 30% faster.

Jul 21 '19 23:07 desialex

At least in my benchmarks this is largely dependent on the length of the input strings. Here is a comparision for different libraries using different string lengths. Both edlib and editdistance appear to have a lot of overhead for short strings.

Only python-Levenshtein uses a quadratic time implementation, while all others use Myers/Hyyrös bitparallel implementation.

Apr 11 '21 12:04 maxbachmann

@maxbachmann Great chart. It shows the choice of implementation really depends on the application.

One application would be finding the closest single words, e.g., in spelling correction.
The other one would be measuring edit distance for long string sequences, e.g., in comparing the layout of documents as in "Comparison and Classification of Documents Based on Layout Similarity". I regret I haven't done a similar evaluation as yours at the time I was implementing document clustering...

For the latter rapidfuzz seems like a good choice.

Jul 25 '21 13:07 dzieciou

editdistance editdistance copied to clipboard

Accurate benchmarks?

editdistance
editdistance copied to clipboard