go-edlib
go-edlib copied to clipboard
π String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...
Go-edlib : Edit distance and string comparison library
Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...
Table of Contents
- Requirements
- Introduction
- Features
- Installation
- Benchmarks
- Documentation
- Examples
- Author
- Contributing
- License
Requirements
- Go (v1.13+)
Introduction
Golang open-source library which includes most (and soon all) edit-distance and string comparision algorithms with some extra!
Designed to be fully compatible with Unicode characters!
This library is 100% test covered π
Features
- Levenshtein
- LCS (Longest common subsequence) with edit distance, backtrack and diff functions
- Hamming
-
Damerau-Levenshtein, with following variants:
- OSA (Optimal string alignment)
- Adjacent transpositions
- Jaro & Jaro-Winkler similarity algorithms
- Cosine Similarity
- Jaccard Index
- QGram
- Sorensen-Dice
- Computed similarity percentage functions based on all available edit distance algorithms in this lib
- Fuzzy search functions based on edit distance with unique or multiples strings output
- Unicode compatibility π₯³
Benchmarks
You can check an interactive Google chart with few benchmark cases for all similarity algorithms in this library through StringsSimilarity function here
However, if you want or need more details, you can also viewing benchmark raw output here, which also includes memory allocations and test cases output (similarity result and errors).
If you are on Linux and want to run them on your setup, you can run ./tests/benchmark.sh
script.
Installation
Open bash into your project folder and run:
go get github.com/hbollon/go-edlib
And import it into your project:
import (
"github.com/hbollon/go-edlib"
)
Run tests
If you are on Linux and want to run all unit tests just run ./tests/tests.sh
script.
For Windows users you can run:
go test ./... # Add desired parameters to this command if you want
Documentation
You can find all the documentation here : Documentation
Examples
Calculate string similarity index between two string
You can use StringSimilarity(str1, str2, algorithm)
function.
algorithm parameter must one of the following constants:
// Algorithm identifiers
const (
Levenshtein Algorithm = iota
DamerauLevenshtein
OSADamerauLevenshtein
Lcs
Hamming
Jaro
JaroWinkler
Cosine
)
Example with levenshtein:
res, err := edlib.StringsSimilarity("string1", "string2", edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Similarity: %f", res)
}
Execute fuzzy search based on string similarity algorithm
1. Most matching unique result without threshold
You can use FuzzySearch(str, strList, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearch("testnig", strList, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result: %s", res)
}
Result: testing
2. Most matching unique result with threshold
You can use FuzzySearchThreshold(str, strList, minSimilarity, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchThreshold("testnig", strList, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig': %s", res)
}
res, err = edlib.FuzzySearchThreshold("hello", strList, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'hello': %s", res)
}
Result for 'testnig': testing
Result for 'hello':
3. Most matching result set without threshold
You can use FuzzySearchSet(str, strList, resultQuantity, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSet("testnig", strList, 3, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Results: %s", strings.Join(res, ", "))
}
Results: testing, test, tester
4. Most matching result set with threshold
You can use FuzzySearchSetThreshold(str, strList, resultQuantity, minSimilarity, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.5, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig' with '0.5' threshold: %s", strings.Join(res, " "))
}
res, err = edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig' with '0.7' threshold: %s", strings.Join(res, " "))
}
Result for 'testnig' with '0.5' threshold: testing test tester
Result for 'testnig' with '0.7' threshold: testing
Get raw edit distance (Levenshtein, LCS, DamerauβLevenshtein, Hamming)
You can use one of the following function to get an edit distance between two strings :
- LevenshteinDistance(str1, str2)
- DamerauLevenshteinDistance(str1, str2)
- OSADamerauLevenshteinDistance(str1, str2)
- LCSEditDistance(str1, str2)
- HammingDistance(str1, str2)
Example with Levenshtein distance:
res := edlib.LevenshteinDistance("kitten", "sitting")
fmt.Printf("Result: %d", res)
Result: 3
LCS, LCS Backtrack and LCS Diff
1. Compute LCS(Longuest Common Subsequence) between two strings
You can use LCS(str1, str2)
function.
lcs := edlib.LCS("ABCD", "ACBAD")
fmt.Printf("Length of their LCS: %d", lcs)
Length of their LCS: 3
2. Backtrack their LCS
You can use LCSBacktrack(str1, str2)
function.
res, err := edlib.LCSBacktrack("ABCD", "ACBAD")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: %s", res)
}
LCS: ABD
3. Backtrack all their LCS
You can use LCSBacktrackAll(str1, str2)
function.
res, err := edlib.LCSBacktrackAll("ABCD", "ACBAD")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: %s", strings.Join(res, ", "))
}
LCS: ABD, ACD
4. Get LCS Diff between two strings
You can use LCSDiff(str1, str2)
function.
res, err := edlib.LCSDiff("computer", "houseboat")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: \n%s\n%s", res[0], res[1])
}
LCS Diff:
h c o m p u s e b o a t e r
+ - - - + + + + + - -
Author
π€ Hugo Bollon
- Github: @hbollon
- LinkedIn: @Hugo Bollon
- Portfolio: hugobollon.me
π€ Contributing
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Show your support
Give a βοΈ if this project helped you!
π License
Copyright Β© 2020 Hugo Bollon.
This project is MIT License licensed.