LibLangly issues

Damerau-Levenshtein

1

[Damerau-Levenshtein](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) is an extension of the Levenshtein edit-distance algorithm to additionally support basic transposition. As such, it gives noticeably better results, and should be implemented and preferentially used.

Entomy

🛠 Enhancement

🆘 Help Wanted

👨🏻‍🎓 Good First Issue

Raita

[Raita](https://en.wikipedia.org/wiki/Raita_algorithm) is another alleged optimization of the Boyer-Moore algorithm. I'd like to see how much, and what the overall performance curve looks like. If it is, it should preferentially be...

Entomy

🛠 Enhancement

🆘 Help Wanted

👨🏻‍🎓 Good First Issue

Apostolico-Giancarlo

The [Apostolico-Giancarlo](https://en.wikipedia.org/wiki/Apostolico%E2%80%93Giancarlo_algorithm) algorithm is allegedly an optimization of the Boyer-Moore algorithm. I'd like to see how much, and what the overall curve looks like. If it is, it should certainly...

Entomy

🛠 Enhancement

🆘 Help Wanted

👨🏻‍🎓 Good First Issue

Boyer-Moore

Currently, the simplified Boyer-Moore-Horspool algorithm is implemented. The more complex [Boyer-Moore](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm) should be supported as well. Conveniently, the table type for the Horspool variant can be inherited, for the creation...

Entomy

🛠 Enhancement

🆘 Help Wanted

👨🏻‍🎓 Good First Issue

FuzzyEquals should be using a better edit-distance algorithm

4

Currently, `FuzzyEquals` makes use of the Levenshtein edit-distance algorithm. This counts substitutions, insertions, and deletions, but not transpositions. `FuzzyEquals()` should take into consideration all four edits.

Entomy

🛠 Enhancement

📊 Data

Word Boundary detection

2

Methods like [`Words()`](https://stringier.github.io/docs/api/Stringier.StringierExtensions.html#Stringier_StringierExtensions_Words_String_) are supposed to be splitting... words. But they don't. They split on spaces, which isn't necessarily the only boundary. Also, [`Words()`](https://stringier.github.io/docs/api/Stringier.StringierExtensions.html#Stringier_StringierExtensions_Words_String_) should be removing non word components,...

Entomy

🛠 Enhancement

🆘 Help Wanted

👨🏻‍🎓 Good First Issue

Glyph Operations in Reverse

2

As [Theo Verweij](https://twitter.com/theo_verweij) brought up [here](https://twitter.com/theo_verweij/status/1305525930169978881?s=20), there're some algorithms that are suboptimal because of their need to reverse a glyph sequence then iterating through it. This poses additional work, and...

Entomy

🛠 Enhancement

🆘 Help Wanted

Pow() for Decimal

1

[`Pow()`](https://docs.microsoft.com/en-us/dotnet/api/system.math.pow) is only defined for [`Double`](https://docs.microsoft.com/en-us/dotnet/api/system.double) which is naive. The geometric mean, which is used for numerous financial calculations, and I'm sure other algorithms, make use of exponentiation of decimals....

Entomy

🛠 Enhancement

🆘 Help Wanted

👨🏻‍🎓 Good First Issue

Aho-Corasick

[Aho-Corasick](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm) is an algorithm capable of efficiently searching for multiple patterns within a single text. This is incredibly useful for various reasons, and should be supported.

Entomy

🛠 Enhancement

🆘 Help Wanted

👨🏻‍🎓 Good First Issue

Implement GB-18030

[GB-18030](https://en.wikipedia.org/wiki/GB_18030) despite being obviated by UTF-8/16, is still actively in use in the Peoples Republic of China. It should be supported.

Entomy

🛠 Enhancement

🆘 Help Wanted

👨🏻‍🎓 Good First Issue

LibLangly
LibLangly copied to clipboard

Metadata

Damerau-Levenshtein

Raita

Apostolico-Giancarlo

Boyer-Moore

FuzzyEquals should be using a better edit-distance algorithm

Word Boundary detection

Glyph Operations in Reverse

Pow() for Decimal

Aho-Corasick

Implement GB-18030

← Metadata

Owner

Metadata

LibLangly LibLangly copied to clipboard

Metadata

← Metadata

Owner

Metadata

LibLangly
LibLangly copied to clipboard