Max Bachmann comments

Results 300 comments of


                                            Max Bachmann

trafficstars

Encode horizontal deltas using +1/-1 indicators for up to 20% speed gain

I was wondering why my own implementation (https://github.com/maxbachmann/RapidFuzz) was faster for very dissimilar sequences at some point. And indeed similar to this PR I store the deltas separately, so I...

Encode horizontal deltas using +1/-1 indicators for up to 20% speed gain

> Yup, for smaller sequences the housekeeping becomes too much. I think the best approach is to use different algorithm (like LandauVishkin) for smaller sequences. Idea was to put this...

Add BK Tree implementation

A common use case for things like search engines is to generate the tree once and then use it for searches. For this reason this should support some kind of...

Add BK Tree implementation

In the same context Levenshtein automatons could be interesting as well: https://dmice.ohsu.edu/bedricks/courses/cs655/pdf/readings/2002_Schulz.pdf

add SIMD support for long sequences

This has a couple of problems: 1) depending on the metric it can be hard to implement, since the algorithms behavior depends on the individual string lengths 2) many of...

add support for romanisation

Note that I am unsure how simple/hard romanisation is depending on the language, since I have zero experience with languages that need this sort of preprocessing. So any solution making...

add support for romanisation

I think a documentation section on options for romanisation for different languages would make sense. It is a fairly common thing people run into when matching non roman-languages and so...

Benchmark

Glad to hear that the library proves to be useful :) Do I understand it correctly, that you have essentially two columns `A` and `B`, where in each row you...

Benchmark

Not in depth. However I just had a quick look at their two benchmarks: ## Long sequences This benchmark is mentioned in the readme and mentions them being significantly faster...

Feature request: Time limit on long-running functions

I am really still not sure what to do in these cases. One basic issue right now is that when processing very long sequences there is not really any way...