Helmut Wollmersdorfer

Results 62 comments of Helmut Wollmersdorfer

@Martinsos In language processing we usually have short strings (average size of a word 6-8 characters) and larger alphabets. I implemented the Hyrrö optimisation for LLCS, LCS, Levenshtein-Distance & Alignment,...

Benchmarked edlib now in simple loop with two short strings. Needs 11 seconds for 1 million iterations, that's a rate of 0.1 M/sec. That's 700 times slower than Hyrrö in...

@ekg Yes, it's about edit distance only. That's exactly what the original poster of this issue meant and compared. Calculating the alignment is ~50% slower in my Perl implementations, because...

@Martinsos Sure, I could send pull request with improvements for edlib, but my time is also limited. I can share my solutions which would be the same for the basic,...

@Martinsos My code (ASCII, m < 64, distance only) is on github: https://github.com/wollmers/Text-Levenshtein-BVXS/blob/master/levbv.c

@lesshaste No, I didn't compare it to https://codegolf.stackexchange.com/questions/200921/calculate-the-average-longest-common-substring-exactly for 2 reasons: - the codegolf task isn't levenshtein distance, it's a somewhat not exactly defined LCS - the code on codegolf...

> Oops, sorry. I meant https://codegolf.stackexchange.com/questions/197565/can-you-calculate-the-average-levenshtein-distance-exactly Has the same second problem: - mangled with optimal way for permutations to win the race of the codegolf task A perfect benchmark would...

> > In the moment I'm struggling with strings longer 64 in the Lev, because Hyrrö doesn't describe how to take care of the carry bit. Myers does. That's why...

> In addition similar to your implementation I provide an overload for characters below 256 (128 in your case). That has a reason. A LUT (LookUp Table) with 256 entries...

@Martinsos Sure, can do this. But it would be better, to update the Perl binding to the latest version of edlib. The [BioPerl](https://bioperl.org/) community is still large. Must contact the...