python-alignment icon indicating copy to clipboard operation
python-alignment copied to clipboard

Long sequences run into RecursionError

Open bertsky opened this issue 7 years ago • 4 comments

Since backtraceFrom is implemented by recursion (instead of iteration), calling the aligner on "long" sequences (more than 1000 items) results in a RecursionError with Python defaults. Extending stack depth limit may cause other serious issues.

bertsky avatar Oct 20 '18 14:10 bertsky

Same here

hadaev8 avatar Sep 25 '19 19:09 hadaev8

@bertsky may you advise similar but better package?

hadaev8 avatar Oct 13 '19 18:10 hadaev8

@hadaev8 I'll try. Having tested several libraries available on PyPI (when searching with align or edit distance keywords) I finally reverted to the standard difflib.SequenceMatcher (with isjunk=None, autojunk=False) – although it is not optimising general global alignment (Needleman-Wunsch) but minimal visual difference (Ratcliff-Obershelp) – for the following reasons:

Generally, you want more than just correctness:

  • robustness:
    • possibility for bailout before entering extremely costly computations (memory or time)
    • heap and stack restrictions
  • efficiency: general complexity is O(n*m) (or even cubic when weighted), but:
    • there is a large difference in the linar factor (esp. between pure Python and good C implementations),
    • optimisation for benign cases is possible and makes a huge difference for average performance
    • you don't always need to enumerate all possible alignments, only one of minimal distance (but possibly with assumptions on ordering)
  • weighting etc

bertsky avatar Oct 14 '19 19:10 bertsky

@bertsky Thanks

hadaev8 avatar Oct 25 '19 19:10 hadaev8