Libtask.jl icon indicating copy to clipboard operation
Libtask.jl copied to clipboard

Improve performance by not keeping unnecessary refs

Open mhauru opened this issue 4 months ago • 1 comments
trafficstars

Currently Libtask stores every variable in a TapedTask's code as a ref. This is because we must know the exact state the execution was at when a produce statement caused us to yield control, so that we can continue with the same state on the next consume call. However, many of these refs are in fact unnecessary: If a variable is only used between two produce statements, we won't ever need its value again after the latter produce. For instance, say you make a TapedTask out of

function f()
    a = 1
    b = 2*a
    produce(b)
    c = 3*b
    produce(c)
    return nothing
end

Currently a, b, and c are all kept as refs. This means that their values will be kept in memory as long as the task exists. Maybe more importantly, it also means that every bit of IR code that accesses any of them is bloated into several statements referencing and dereferencing the corresponding refs. However, for a this is all unnecessary, since when we continue execution after the first produce only the value of b matters for the rest of the function. Likewise for c.

There are many levels of sophistication at which we could try to analyse the IR to figure out which variables need to be turned into refs and which don't, but even quite a rudimentary analysis might yield large simplifications in the IR that Libtask produces, and thus great runtime performance gains.

Tagging @willtebbutt since I mentioned this idea to him, and he thought it wasn't badly misguided.

mhauru avatar Jul 15 '25 12:07 mhauru

I fully agree with your thoughts on this. The performance gains could be very substantial.

willtebbutt avatar Jul 15 '25 13:07 willtebbutt