LibLangly
LibLangly copied to clipboard
The combined Langly runtime
[Damerau-Levenshtein](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) is an extension of the Levenshtein edit-distance algorithm to additionally support basic transposition. As such, it gives noticeably better results, and should be implemented and preferentially used.
[Raita](https://en.wikipedia.org/wiki/Raita_algorithm) is another alleged optimization of the Boyer-Moore algorithm. I'd like to see how much, and what the overall performance curve looks like. If it is, it should preferentially be...
The [Apostolico-Giancarlo](https://en.wikipedia.org/wiki/Apostolico%E2%80%93Giancarlo_algorithm) algorithm is allegedly an optimization of the Boyer-Moore algorithm. I'd like to see how much, and what the overall curve looks like. If it is, it should certainly...
Currently, the simplified Boyer-Moore-Horspool algorithm is implemented. The more complex [Boyer-Moore](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm) should be supported as well. Conveniently, the table type for the Horspool variant can be inherited, for the creation...
Currently, `FuzzyEquals` makes use of the Levenshtein edit-distance algorithm. This counts substitutions, insertions, and deletions, but not transpositions. `FuzzyEquals()` should take into consideration all four edits.
Methods like [`Words()`](https://stringier.github.io/docs/api/Stringier.StringierExtensions.html#Stringier_StringierExtensions_Words_String_) are supposed to be splitting... words. But they don't. They split on spaces, which isn't necessarily the only boundary. Also, [`Words()`](https://stringier.github.io/docs/api/Stringier.StringierExtensions.html#Stringier_StringierExtensions_Words_String_) should be removing non word components,...
As [Theo Verweij](https://twitter.com/theo_verweij) brought up [here](https://twitter.com/theo_verweij/status/1305525930169978881?s=20), there're some algorithms that are suboptimal because of their need to reverse a glyph sequence then iterating through it. This poses additional work, and...
[`Pow()`](https://docs.microsoft.com/en-us/dotnet/api/system.math.pow) is only defined for [`Double`](https://docs.microsoft.com/en-us/dotnet/api/system.double) which is naive. The geometric mean, which is used for numerous financial calculations, and I'm sure other algorithms, make use of exponentiation of decimals....
[Aho-Corasick](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm) is an algorithm capable of efficiently searching for multiple patterns within a single text. This is incredibly useful for various reasons, and should be supported.
[GB-18030](https://en.wikipedia.org/wiki/GB_18030) despite being obviated by UTF-8/16, is still actively in use in the Peoples Republic of China. It should be supported.