opt_einsum icon indicating copy to clipboard operation
opt_einsum copied to clipboard

Thoughts on making the numpy dependency optional?

Open janeyx99 opened this issue 2 years ago • 6 comments

Proposal

Make the numpy dependency optional, if possible.

Why?

Minimizing dependencies is a general goal as it allows a bigger audience to reap the benefits of this library. More specifically, some of us are interested in making opt_einsum a hard dependency for torch, but we would like to keep numpy unrequired. (If you're curious why torch does not have a hard dependency on numpy, see https://github.com/pytorch/pytorch/issues/60081, tl;dr being the last comment.)

A hard dependency would mean all torch users would get the benefits of opt_einsum right away without thinking too hard/needing to manually install opt_einsum themselves.

Alternatives

We could also have torch vendor in opt_einsum, but that is increased complexity/maintenance + we would like to automatically subscribe to improvements in opt_einsum!

janeyx99 avatar Nov 01 '22 20:11 janeyx99

I think this is entirely possible as we have a fairly weak dependance on NumPy beyond testing. Feel free to take a crack at it, I can look into removing NumPy this weekend.

dgasmith avatar Nov 02 '22 04:11 dgasmith

Yes I agree this would be a nice thing to do. From what I can tell minor problem points where one can't just import/mock numpy lazily are:

  • ssa_to_linear - this function, which is used in the greedy path optimization and is not related to backends etc, is actually $\mathcal{O}(N^2)$ and uses numpy to speed it up. I could see if a pure python version can be made fast enough
  • some type hints use np.ndarray - not sure what the solution is for using types from libraries that are not installed.

jcmgray avatar Nov 02 '22 19:11 jcmgray

Here's an alternative pure python implementation of ssa_to_linear:

def ssa_to_linear_A(ssa_path):
    N = sum(map(len, ssa_path)) - len(ssa_path) + 1
    ids = list(range(N))
    path = []
    ssa = N
    for scon in ssa_path:
        con = sorted(map(ids.index, scon))
        for j in reversed(con):
            ids.pop(j)
        ids.append(ssa)
        path.append(con)
        ssa += 1
    return path

It's actually faster until one gets to quite large numbers of inputs (x-axis): ssa_to_linear

At the largest size here the actual main greedy algorithm takes ~7sec so the extra slowdown is not ideal but still only a small part of the whole path finding. Maybe the implementation can be improved too..

jcmgray avatar Nov 02 '22 20:11 jcmgray

Nice check!

n is unlikely to go over 1000 except for extreme cases, we could also have two algorithms and a switch of:

if len(n) > 1000 and has_numpy:
    return _numpy_impl(*args)
else:
    return _python_impl(*args)

The library isn't strongly type hinted yet (still plenty of Any running around~). I would vote we replace np.ndarray with some library agnostic Protocol type.

dgasmith avatar Nov 02 '22 23:11 dgasmith

I actually realized you can implement ssa_to_linear in $n\log(n)$ time:

import bisect

def ssa_to_linear_bis(ssa_path, N=None):
    if N is None:
        N = sum(map(len, ssa_path)) - len(ssa_path) + 1
    ids = list(range(N))
    path = []
    ssa = N
    for scon in ssa_path:
        con = sorted([bisect.bisect_left(ids, s) for s in scon])
        for j in reversed(con):
            ids.pop(j)
        ids.append(ssa)
        path.append(con)
        ssa += 1
    return path

this is significantly faster than the numpy version throughout the range.

jcmgray avatar Sep 02 '23 17:09 jcmgray

@janeyx99 Can you test torch against current main?

@jcmgray This code snippet was awesome to replace the previous numpy version.

dgasmith avatar Jul 09 '24 23:07 dgasmith