gensim
gensim copied to clipboard
Using efficient np.linalg.multi_dot
It is more succinct and effective to refactor code to use np.linalg.multi_dot
rather than numerous np.dot
routines.
What do you think about this change that has practical value?
Can you quantify what the benefit might be, perhaps with a demo showing the improvement?
Please check the NumPy API documentation here (https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html) If you do not trust this doc, I can run a small performance check for you. Thanks for the comment.
Where does Gensim multiply more than 2 matrices (so that multidot has any effect)?
Like @gojomo, a practical demonstration of the benefits would be nice.
@piskvorky In the PR, there were two calls to np.dot
, which we can replace with one call to np.linalg.multi_dot
, Invoking several np.dot
invocations is less efficient, according to the API documentation, than using np.linalg.multi_dot
.
Please refer the blow demo (I am not sure this is the exact demo that you are looking for)
from datetime import datetime
A = np.random.random((10000, 100))
B = np.random.random((100, 1000))
C = np.random.random((1000, 5))
D = np.random.random((5, 333))
multi_dot_start = datetime.now()
_ = np.linalg.multi_dot([A, B, C, D])
multi_dot_end = datetime.now()
print("multi_dot execution time",(multi_dot_end-multi_dot_start).microseconds)
dot_start = datetime.now()
_ = np.dot(np.dot(np.dot(A, B), C), D)
dot_end = datetime.now()
print("dot execution time",(dot_end - dot_start).microseconds)
Output
multi_dot execution time 17929
dot execution time 85056
multi_dot
is 4.7 times faster, and it also succinct and clean which we should not forget.
But where would the proposed replacement make any noticeable difference to a Gensim user?
Is the print_error()
function that's changed in the PR a major contributor to the runtime of the svd_error.py
script? How much would a representative usage of that script be sped up by this change?
I proposed this patch as an external contributor to make the code more concise and efficient. But how frequently do you use the script svd error.py
and what effect does that have? I believe the core contributors should make that decision. But rather than debating about the effects, I think the code has to be improved and adjusted. The impact is always subjective.
Sorry, without a clear demonstration of its benefit, I'm -1 on introducing changes.
If there's a compelling use-case, such as a speed-up when training a new model, or calculating document similarities, please post it here.
Speeding up an internal test script by (milli? nano?) seconds is not it. We'd need something that impacts users noticeably.
The PR submitted does not make the source code more 'succinct' or 'concise' – it's 7 characters longer than the original code.
The claim that it is more efficient, in this usage, remains an untested theory. (For example, it's possible that in this case multi_dot
may wind up choosing the same order-of-operations as the prior code did – while adding overhead to make that decision. Usually, actual improvements can be objectively proven, not just asserted.)
As a matter of style, I suppose one could try to make a case that it'd be better to always prefer multi_dot
when there are any chained dot
operations like this, just in case its optimal reordering offers a big win.
However, more developers who will ever be reading or modifying this code are likely to know, offfhand, what np.dot
is than what np.linalg.multi_dot
is. Seeing multi_dot
in the code might require looking up docs, where dot
would not. And, it turns out there are even some (corner, perhaps contrived) cases where multi_dot
is slower. So even a "good style just in case it helps" case isn't a slam dunk, on grounds of either readability or performance.
I would be favorable to the idea of preferring multi_dot
in any cases where it has a chance of making a user-noticeable performance difference, though.
Thank you for your effort @maldil . I see this change has pros and cons.
Pros: maybe more efficient Cons: more obscure
Given that efficiency is not a prime concern of the affected code, I think at this stage it's best not to merge this change, because the pros outweigh the cons.