TensorOperations.jl icon indicating copy to clipboard operation
TensorOperations.jl copied to clipboard

Feature Request: Memory-Bound Contraction / Cache-Limited Execution Option for @tensor

Open Yonatan-Nissan opened this issue 1 month ago • 4 comments

Hi, and thank you for maintaining this excellent package!

I would like to request (or discuss) a feature that would allow running tensor contractions with an explicit memory limit on the temporary workspace/cache used during the contraction.

Motivation

When contracting large tensors, the temporary memory allocated internally by the contraction planning step can exceed available RAM on HPC clusters or systemd-bounded jobs. In my use case (large 2D tensor network contractions for condensed-matter simulations), even a single contraction like: @tensor A[...] = B[...] * C[...] may allocate temporary intermediates that are larger than the tensors themselves. I would like to constrain the contraction planner so that it only uses contraction paths whose peak memory fits within a user-specified limit (e.g., 40 GB), even if this results in a slower contraction.

Requested Feature

Add an option such as: @tensor memory_limit=40_000_000_000 A[...] = B[...] * C[...] or a global setting like: TensorOperations.set_memory_limit!(4e10) This would instruct TensorOperations to choose contraction strategies (path + temporary intermediates) under the constraint that peak temporary memory ≤ memory_limit.

Why this matters

  • Many HPC jobs have strict memory limits (e.g. via systemd cgroups or schedulers).
  • Current behavior sometimes causes unexpected OOM during contraction planning or execution.
  • For my application (HOTRG/TNRG), even a single contraction can exceed node memory unless the algorithm is forced to use a less memory-hungry contraction order.
  • Having memory-aware contraction options — even approximate or heuristic — would significantly improve usability for large-scale tensor network simulations.

Best regards, Yonatan

Yonatan-Nissan avatar Nov 18 '25 21:11 Yonatan-Nissan

Hi Yonatan,

I'm the author of a Julia package called TNRKit.jl that implements a bunch of TNR methods, like TRG, HOTRG (both 2D and 3D) and LoopTNR for example.

The package is based upon TensorKit, which uses TensorOperations and its @tensor syntax.

You could quickly check if your issues persist by running:

using Pkg
Pkg.add("TNRKit")
using TNRKit, TensorKit

run!(HOTRG(T), truncdim(D), maxiter(N))

Where T is your own TensorKit tensor, D is the maximal allowed bond dimension and N is the amount of iterations of the HOTRG algorithm.

I'm quite curious to find out why you're experiencing memory issues, can you give some more insight into what calculations you're running?

VictorVanthilt avatar Nov 18 '25 21:11 VictorVanthilt

Thank you very much for the thoughtful response. I’ll take a closer look at your library and the tools it provides. I’ve already noticed that the HOTRG contraction performed in the library is quite similar to mine:

 @tensor T[-1 -2; -3 -4] :=
        conj(U[1 2; -1]) * U[3 4; -4] * A2[1 5; -3 3] * A1[2 -2; 5 4]
    return T 

On HPC systems where jobs run inside strict memory-constrained cgroups, these uncontrolled peaks create substantial memory pressure. They can push the job right up to the memory limit and either throttle performance or trigger out-of-memory termination. To better control the memory, I have split the contraction into substeps, and the most memory- and time-consuming step in the HOTRG algorithm is contracting two rank-5 tensors:

@tensor A_next[i,j,k,l] := AUl[i,a,b,c,l] * AUl[k,c,b,a,j]

This is why the ability to cap or guide the temporary memory usage is so important for my workflow: it would allow these calculations to run reliably in production HPC environments without creating unexpected memory spikes.

Thanks again for your work — and for pointing me toward the relevant tools. I’m happy to test or provide additional examples if that would be useful.

Yonatan-Nissan avatar Nov 19 '25 21:11 Yonatan-Nissan

Dear @Yonatan-Nissan , I think your request for a memory-bounded contraction is a valid one. However, I currently don't have the bandwidth to implement this, and in fact also lack a bit of the expertise to then determine what the optimal strategy is if the limit is exceeded. I would need to look a bit more in the literature for that.

The page of https://github.com/bsc-quantic/Tenet.jl seems to mention slicing / cutting of tensor networks. I don't know if this is the feature you need, but maybe this can be useful.

Jutho avatar Nov 19 '25 21:11 Jutho

One other notable source might be to use OMEinsumContractionOrders.jl (this links to the slicing page of the docs). The documentation is a bit terse, but that could also be used to go from an optimal contraction order to a sliced one that satisfies the memory restrictions.

As a side note though, one other problem you might be facing, depending on the context in which you are running this, is that the Julia garbage collector is not keeping up, and is not aggressively reclaiming memory that is actually no longer being used. This tends to often happen when contractions happen in (tight) loops, especially when there is multithreading involved. There are various ways to mitigate that, ranging from manual GC.safepoint() insertions, to more drastic manual calls to the garbage collector GC.gc(). Additionally, you can start Julia with a hint of how large the heap size is allowed to grow to, see e.g. https://julialang.org/blog/2023/04/julia-1.9-highlights/#memory_usage_hint_for_the_gc_with_--heap-size-hint

Finally, it can also be useful to simply reduce the usage of temporary memory altogether, for example through the @butensor (Bumper.jl-based) macro, or by using the allocator = ManualAllocator() keyword to relieve the pressure on Julia's GC by manually managing memory. See also https://quantumkithub.github.io/TensorOperations.jl/latest/man/backends/#Allocators for a little more information on that.

Long story short, it could be interesting to verify what is causing your memory pressure, using a back-of-the-envelope calculation to check what the largest objects are and if they fit in memory, and if they do, it must be that Julia is not freeing up the memory fast enough.

Hope this helps, feel free to ask for more information if some of these things aren't clear.

lkdvos avatar Nov 30 '25 18:11 lkdvos