python-alignment
python-alignment copied to clipboard
Long sequences run into RecursionError
Since backtraceFrom is implemented by recursion (instead of iteration), calling the aligner on "long" sequences (more than 1000 items) results in a RecursionError with Python defaults. Extending stack depth limit may cause other serious issues.
Same here
@bertsky may you advise similar but better package?
@hadaev8 I'll try. Having tested several libraries available on PyPI (when searching with align or edit distance keywords) I finally reverted to the standard difflib.SequenceMatcher (with isjunk=None, autojunk=False) – although it is not optimising general global alignment (Needleman-Wunsch) but minimal visual difference (Ratcliff-Obershelp) – for the following reasons:
- python-alignment (this repo): issues #9, #10 and #11
- edlib: only ASCII
- edit_distance: heap overflow
- weighted-levenshtein: only ASCII
- python-Levenshtein: I don't remember, sorry!
- others: not for general strings (but DNA sequences or scalars etc)
Generally, you want more than just correctness:
- robustness:
- possibility for bailout before entering extremely costly computations (memory or time)
- heap and stack restrictions
- efficiency: general complexity is
O(n*m)(or even cubic when weighted), but:- there is a large difference in the linar factor (esp. between pure Python and good C implementations),
- optimisation for benign cases is possible and makes a huge difference for average performance
- you don't always need to enumerate all possible alignments, only one of minimal distance (but possibly with assumptions on ordering)
- weighting etc
@bertsky Thanks