peoples-speech icon indicating copy to clipboard operation
peoples-speech copied to clipboard

Speed up DSAlign

Open galv opened this issue 3 years ago • 1 comments

Add a unit-test (dsalign_lib_test.py) for checking that these speed ups actually work.

I add cython as a dependency in order to make the smithwaterman function faster. This makes the runtime of "sw_align" approximately 10x faster. "sw_align_old" is retained in case we ever want to check that the new function exactly matches the old output (I already checked that it does with the unit test).

This is the current results from profiling dsalign_lib_test.py. Previously we took over 200 seconds to align this segment, but now we are at around 120 seconds.

Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 52003 48.730 0.001 111.226 0.002 text.py:184(similarity) 69031918 39.739 0.000 57.229 0.000 utils.py:105(enweight) 69215860 12.950 0.000 12.993 0.000 text.py:152(ngrams) 670 10.514 0.016 10.523 0.016 search.py:49(sw_align) 68719564 4.510 0.000 4.510 0.000 {built-in method builtins.abs} 13218994 2.369 0.000 2.369 0.000 {built-in method builtins.min} 30927766 2.250 0.000 2.250 0.000 init.py:570(missing) 601 1.939 0.003 12.488 0.021 search.py:107(find_best) 52003 0.422 0.000 111.648 0.002 dsalign_lib.py:196() 52576 0.206 0.000 0.206 0.000 {built-in method builtins.sum} 104678 0.187 0.000 0.269 0.000 init.py:550(init) 1 0.170 0.170 0.238 0.238 text.py:63(add_original_text) 1278000/1277989 0.150 0.000 0.150 0.000 {built-in method builtins.len} 312018 0.139 0.000 0.139 0.000 text.py:168(weighted_ngrams) 52003 0.103 0.000 111.786 0.002 dsalign_lib.py:215()

galv avatar Jul 06 '21 21:07 galv

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

github-actions[bot] avatar Jul 06 '21 21:07 github-actions[bot]