lwd icon indicating copy to clipboard operation
lwd copied to clipboard

Recomputations do not use cached values when inputs are the same

Open panglesd opened this issue 3 months ago • 4 comments

I'd hope to have recomputations use cached values when inputs are the same! I think it is not the case currently...

Here is an example. Hopefully, it's copy-pastable in utop so you can test.

I have a int var representing time:

let time = Lwd.var 0

And a computation that is "costly" but will only change if time is not too big.

let max_time = 1000
let big_list = List.init max_time Fun.id

let compute time =
  Format.printf "Let's go with a costly computation with time = %d\n" time;
  List.filter ((>=) time) big_list

I though Lwd would cache the result and avoid the recomputation in the following:

let l =
  let$* very_big =
    let$ time = Lwd.get time in
    time >= max_time
  in
  if very_big then
    Lwd.return @@ compute max_time
  else
    let$ time = Lwd.get time in
    compute time

When time changes, the very_big value may not change (from true to true) but compute max_time is still done, as seen by the log...

> let root = Lwd.observe l
> let () = Lwd.set time (max_time + 1)
> let sample1 = Lwd.quick_sample root
Let's go with a costly computation with time = 1000
> let () = Lwd.set time (max_time + 2)
> let sample2 = Lwd.quick_sample root
Let's go with a costly computation with time = 1000

I'd really like the second sample to not recompute after getting very_big unchanged, and thus not show the second "Let's go with a costly computation with time = 1000"!

I'd be happy to help in fixing this issue, but I would also appreciate some help to get started 🙂

panglesd avatar Sep 24 '25 15:09 panglesd

The solution I would recommend is to wrap expensive computations with this caching function:

val cache_changes : 'a Lwd.t -> equal:('a -> 'a -> bool) -> ('a Lwd.t -> 'b Lwd.t) -> 'b Lwd.t

let cache_changes computation ~equal k =
  let cache = ref None in
  Lwd.bind computation ~f:(fun value ->
      match !cache with
      | None ->
        let var = Lwd.var value in
        let result = k (Lwd.get var) in
        cache := Some (var, result);
        result
      | Some (var, result) ->
        let value' = Lwd.peek var in
        if not (equal value value') then
          Lwd.set var value;
        result
    )

We can then rewrite your example:

let without_cache =
  let$* very_big =
    let$ time = Lwd.get time in
    time >= max_time
  in
  if very_big then
    Lwd.return @@ compute max_time
  else
    let$ time = Lwd.get time in
    compute time
let with_cache =
  cache_changes
    (let$ time = Lwd.get time in
    time >= max_time)
   ~equal:Bool.equal
  @@ fun very_big ->
  let$* very_big = very_big in
  if very_big then
    Lwd.return @@ compute max_time
  else
    let$ time = Lwd.get time in
    compute time

On my computer it outputs:

Without caching                     
Let's go with a costly computation with time = 1000
Let's go with a costly computation with time = 1000
With caching
Let's go with a costly computation with time = 1000

let-def avatar Sep 24 '25 22:09 let-def

The reasons I went with this design:

Why not systematically cache computations?

The notion of “equality” is not universally defined. For many types (functions, floats containing NaN, or custom records and data structures), structural equality might not be sensible and expensive. Requiring the user to supply an equal function for every node would make the API noisy and could be expensive if the equality test itself traverses large structures.

Moreover, point‑wise equality is too fine‑grained for complex pipelines. For example, consider a succession of filters to a collection: col |> foo |> bar |> baz, the intermediate result of foo may normalise the data so that the output of bar is unchanged. A naïve equality check at baz would still walk the entire col each time, incurring an O(n) cost even though the final result is unchanged.

To address this, Lwd ships Lwd_seq and Lwd_table, which define identity at the element level. When used appropriately, they push change detection down to the leaves of the graph and skip internal computations without explicit equality tests.

Using continuation‑passing style for cached computations

The primary motivation for the current design is simplicity. A fully‑integrated caching layer would require the runtime to maintain per‑node caches and a robust invalidation strategy, all of which would (significantly!) complicate the scheduler and increase memory overhead.

API-wise, an ideal solution would be a val cutoff : 'a Lwd.t -> equal:('a -> 'a -> bool) -> 'a Lwd.t helper, which lets the user explicitly guard a computation against recomputation when a supplied equality predicate returns true.

In early discussions with @art‑w we discovered that a naïve cutoff semantics could violate desirable properties (e.g., referential transparency), which is why we refrained from adding it to the core API. Unfortunately I forgot the concrete examples.

The CPS-based solution encourages structuring the computation around a cheap spine that is always recomputed, and expensive leaves that are cached on demand. I was satisfied with this easy work-around which avoided all the problems we found with a native notion of cutoff and I decided to stop the investigations.

Let me know if this approach fits your use-case, if not I would be interested to take another look at this problem with your help. But if this notion of cutoff is central to your problem, I think that JaneStreet's Incremental library might be more appropriate.

let-def avatar Sep 24 '25 22:09 let-def

Thanks a lot for your answer, and for the clever caching workaround! I think it will be enough for all my usecases. So: 🙏.

The primary motivation for the current design is simplicity.

This is a motivation I like :)

The notion of “equality” is not universally defined.

I agree with this, but physical equality is a good candidate: it is the strictest equality we could define, it's not expensive and it works well at least with booleans and integers. And the user is not forced to give an equality.

I glanced at incremental and it seems that they are using physical equality to avoid recomputations.

(But the added complexity to implement caching is a very good reason not to do it, even if physical equality would work!)

In early discussions with @art-w we discovered that a naĂŻve cutoff semantics could violate desirable properties (e.g., referential transparency), which is why we refrained from adding it to the core API. Unfortunately I forgot the concrete examples.

Maybe @art-w remembers? (github messed up the previous ping)

panglesd avatar Sep 25 '25 13:09 panglesd

I don't recall either hm. I remember a bug in one reactive application (not based on Lwd) where an array value was used, so the physical-equality cutoff would trigger believing the array to be unchanged... and then the GUI would not display the latest values of the array. So supporting implicit cutoff requires users to be aware of the optimization by using only purely functional datastructures or being very careful around mutations (e.g. by wrapping the array in a single field record to signal a "physical" change to the equality cutoff, and hoping the compiler doesn't optimize it away >_>).

The other potential issue for Lwd is the risk of a memory leak, since it's recomputing on-demand, the cached values could be kept for a very long time and it's not easy to "free them" without forcing a bunch of leaves. The current implementation frees these values as soon as possible however (which is why it can't easily check if the new value is equal to the previous one).

(I personally like and am used to the physical equality cutoff in reactive libraries, but I agree it's a trade off... Lwd is one of the few libraries that doesn't do it automatically, so perhaps there are enough existing alternatives to justify doing things differently :) )

art-w avatar Sep 25 '25 15:09 art-w