opt_einsum Optimal path less efficient than greedy path

Script to reproduce the problem:

import numpy as np
import opt_einsum
print("opt_einsum version: ", opt_einsum.__version__)

no = 300
nv = 1200
naux = 50
nx = 2000

M = np.empty((nx,nx))
eta = np.empty((no,nx))
eta1 = np.empty((naux,nx))
theta = np.empty((naux,no))

subscripts = "iP,sP,PQ,jQ,tQ,ti,sj->"

print("\nOptimal path:")
path_info = opt_einsum.contract_path(
    subscripts,
    eta,eta1,M,eta,eta1,theta,theta,
    optimize='optimal'
)
print(path_info[1])

print("\nGreedy path:")
path_info = opt_einsum.contract_path(
    subscripts,
    eta,eta1,M,eta,eta1,theta,theta,
    optimize='greedy'
)
print(path_info[1])

Output:

opt_einsum version:  3.4.0

Optimal path:
  Complete contraction:  iP,sP,PQ,jQ,tQ,ti,sj->
         Naive scaling:  6
     Optimized scaling:  4
      Naive FLOP count:  6.300e+15
  Optimized FLOP count:  1.827e+11
   Theoretical speedup:  3.449e+4
  Largest intermediate:  6.000e+10 elements
--------------------------------------------------------------------------------
scaling        BLAS                current                             remaining
--------------------------------------------------------------------------------
   3              0             PQ,sP->PQs                  iP,jQ,tQ,ti,sj,PQs->
   4              0           PQs,jQ->PQsj                    iP,tQ,ti,sj,PQsj->
   4           GEMM            PQsj,sj->PQ                         iP,tQ,ti,PQ->
   3           GEMM              PQ,iP->Qi                            tQ,ti,Qi->
   3           GEMM              Qi,ti->Qt                               tQ,Qt->
   2     DOT/EINSUM                Qt,tQ->                                    ->

Greedy path:
  Complete contraction:  iP,sP,PQ,jQ,tQ,ti,sj->
         Naive scaling:  6
     Optimized scaling:  3
      Naive FLOP count:  6.300e+15
  Optimized FLOP count:  9.242e+8
   Theoretical speedup:  6.817e+6
  Largest intermediate:  4.000e+6 elements
--------------------------------------------------------------------------------
scaling        BLAS                current                             remaining
--------------------------------------------------------------------------------
   3           GEMM              ti,iP->tP                   sP,PQ,jQ,tQ,sj,tP->
   3           GEMM              sj,jQ->sQ                      sP,PQ,tQ,tP,sQ->
   3           GEMM              tP,tQ->PQ                         sP,PQ,sQ,PQ->
   2              0              PQ,PQ->PQ                            sP,sQ,PQ->
   3           GEMM              PQ,sP->Qs                               sQ,Qs->
   2     DOT/EINSUM                Qs,sQ->                                    ->

Feb 24 '25 00:02 fishjojo

Thank you for the report, this is quite odd behavior and defiantly a bug. Even setting memory_limit=-1 does not find the correct path. For now I would recommend the dp algorithm which functionally provides the same results and is generally proving to be more robust and faster.

@jcmgray we may wish to consider deprecating the naive optimal algorithm and replace it with dp (or just alias optimal to dp).

Mar 08 '25 18:03 dgasmith

Yes I do think that makes sense. I might open another issue about a more major refactor of the optimal and greedy pathfinders that might be worth doing at the same time.

Mar 11 '25 00:03 jcmgray

Hi devs!

I'm currently translating opt_einsum.contract_path to rust (RESTGroup/opt-einsum-path, mostly for native rust but not for efficiency of rust). I noticed several issues that relates to optimal path not actually optimal (also summarized at https://github.com/dgasmith/opt_einsum/issues/249).

I believe that these issues are caused by the use of cache (also stated at https://github.com/dgasmith/opt_einsum/issues/99#issuecomment-523877046 by @jcmgray), and I have also validated that in our own project.

https://github.com/dgasmith/opt_einsum/blob/70031398125a9d5c6de87e038c7c6a50a11dac79/opt_einsum/paths.py#L248-L251

Perhaps change these lines to the following will resolve issue

k12, flops12 = calc_k12_flops(inputs, output_set, remaining, i, j, size_dict)

But that will cause time increase to optimize contract path if many terms to be contracted.

I think that since the current version of optimal will (probably) be deprecated (due to another choice dp), in future the path-optimization efficiency of the deprecated optimal is not of that important. Considering that disabling result_cache can make opt_einsum.contract_path more correct in most cases, so I think at this time disabling result_cache is probably more acceptable then at the time of 2019 (https://github.com/dgasmith/opt_einsum/issues/99) if dp will be aliased to optimal, and the current optimal to be deprecated.

Aug 15 '25 13:08 ajz34

Yes the blocker on changing the default optimal implementation (other than time) is what to do with the memory_limit kwarg. Probably the easiest thing to do is to maintain this legacy version of optimal with the cache disabled (can we infer automatically when enabling is going to cause issues?) for use when memory_limit is supplied, else default to the new implementation.

Aug 15 '25 21:08 jcmgray