SuffixTree
SuffixTree copied to clipboard
Optimized implementation of suffix tree in python using Ukkonen's algorithm.
This module is an optimized implementation of Ukkonen's suffix tree algorithm in python which will be having most of the important text processing functionalities such as:
Search for strings:
✓ Check if a string P of length m is a substring in O(m) time.
✓ Find the first occurrence of the patterns P1,... ,Pq of total length m as substrings in O(m) time.
✓ Find all z occurrences of the patterns P1,... ,Pq of total length m as substrings in O(m+z) time.
- Search for a regular expression P in time expected sublinear in n
- Find for each suffix of a pattern P the length of the longest match between a prefix of P[i... m] and a substring in D in
time. This is termed the matching statistics for P
Find properties of the strings:
- Find the longest common substrings of the string Si and Sj in
time. - Find all maximal pairs, maximal repeats or supermaximal repeats in
time. - Find the Lempel–Ziv decomposition in
time.[10] - Find the longest repeated substrings in
time. - Find the most frequently occurring substrings of a minimum length in
time. - Find the shortest strings from
that do not occur in D in O(n+z) time, if there are z such strings. - Find the shortest substrings occurring only once in
time. - Find, for each i the shortest substrings of Si not occurring elsewhere in D in
time.
sources:
- http://web.stanford.edu/~mjkay/gusfield.pdf
- On–line construction of suffix trees. Esko Ukkonen
- http://www.geeksforgeeks.org/ukkonens-suffix-tree-construction-part-6/