peoples-speech
peoples-speech copied to clipboard
Speed up DSAlign
Add a unit-test (dsalign_lib_test.py) for checking that these speed ups actually work.
I add cython as a dependency in order to make the smithwaterman function faster. This makes the runtime of "sw_align" approximately 10x faster. "sw_align_old" is retained in case we ever want to check that the new function exactly matches the old output (I already checked that it does with the unit test).
This is the current results from profiling dsalign_lib_test.py. Previously we took over 200 seconds to align this segment, but now we are at around 120 seconds.
Ordered by: internal time
ncalls tottime percall cumtime percall
filename:lineno(function)
52003 48.730 0.001 111.226 0.002 text.py:184(similarity)
69031918 39.739 0.000 57.229 0.000 utils.py:105(enweight)
69215860 12.950 0.000 12.993 0.000 text.py:152(ngrams)
670 10.514 0.016 10.523 0.016 search.py:49(sw_align)
68719564 4.510 0.000 4.510 0.000 {built-in method
builtins.abs}
13218994 2.369 0.000 2.369 0.000 {built-in method
builtins.min}
30927766 2.250 0.000 2.250 0.000
init.py:570(missing)
601 1.939 0.003 12.488 0.021 search.py:107(find_best)
52003 0.422 0.000 111.648 0.002
dsalign_lib.py:196(
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅