rascal Enhance @memo with configurability and effects of @benmanes/caffeine

Is your feature request related to a problem? Please describe.

@memo allows caching side-effects to otherwise pure Rascal functions. It helps to keep the code clean whilst also providing a mode of efficiency normally associated with procedural and object-oriented programming.
but to actually get efficiency benefits, there are fine lines to walk that trade memory for CPU for example. It can depend on all kinds of circumstances which kind of caching strategy is effective.
Due to the many features of the JVM garbage collector, there are many ways to create caching strategies. benmanes/caffeine offers a lot of this variation off-the-shelf and with added features such as time-dependent cache clearing.

Describe the solution you'd like

Propose to (via @DavyLandman a few years ago) extend the @memo tag with a mini abstract configuration DSL that can be used to build caffeine caches with different properties.
The caches would still be for the return value of the function, depending on its arguments
It would be great to add configurability for arguments to select and arguments to ignore for the cache key
It would be great to built-in timestamp functionality for loc parameters; i.e. that caches are cleared when timestamps of files are older than the timestamp stored in the cache
It would be great to factor out most of the code needed to implement this from the interpreter/compiler run-time such that it can be reused between the two

Describe alternatives you've considered

people write all kinds of Rascal code to cache and optimize their code now, including caching on disk. It's a hairy domain.
using @memo now actually often leads to dissappointment, we either cache too many values or we cache too few, or what we cache uses too much memory to be effective. we really need different strategies depending on the application
we do not have alternatives to play around with weak or soft keys or values at all, everything is what @memo currently has to offer.

Impact

the compiler would benefit itself from this feature, since it needs to cache the results of modular compilation.

Dec 15 '21 09:12 jurgenvinju

It also thinkable to compile the configuration DSL directly to Java code and only support this for the compile Rascal code.

Dec 15 '21 09:12 jurgenvinju

I think it's already there: https://github.com/usethesource/rascal/blob/master/src/org/rascalmpl/library/util/Memo.rsc

Some examples: https://github.com/usethesource/rascal/blob/master/src/org/rascalmpl/library/lang/rascal/tests/basic/Memoization.rsc and https://github.com/usethesource/rascal/blob/master/test/org/rascalmpl/test/functionality/MemoizationTests.java

Dec 15 '21 09:12 DavyLandman

Yes that one! but extended with everything (or almost everything) that caffeine can do, plus the file timestamp things.

{strong,soft,weak,phantom?} references x {strong,soft,weak,phantom?} keys x parameter selections
automatic asynchronous pre-loading
size-based eviction
time-based eviction
automatic asyncronous serialization to a file location, and automatic recovery from those

Plus I'd like to see an interface for debugging/optimizing the choice of these features using some kind of statistical reports.

Dec 15 '21 09:12 jurgenvinju

And agreed, maybe @PaulKlint can already take a lot of benefit from what is in https://github.com/usethesource/rascal/blob/master/src/org/rascalmpl/library/util/Memo.rsc now for the compiler.

Dec 15 '21 09:12 jurgenvinju

asynchronous pre-loading would be impossible for the interpreter to do.

Dec 15 '21 09:12 jurgenvinju

Okay, that's quite a big feature set, and requires more design of the @memo tag. We now have:

access based eviction (so time, but only on access, not on store)
entry-based eviction
softreferences to avoid OutOfMemory.

Some of the features you mentioned are easy to implement. But just to be clear, we are not using Caffeine in this case, the memo tag has some specific features that don't map to Caffeine.

Dec 15 '21 09:12 DavyLandman

ok yes; understood. I'm expecting an "extended subset" of the Caffeine features and I'm not sure where the boundaries are.

In particular the asynchronous backroom serialization to disk; I'm not sure how much of Caffeine we could reuse for that, because of course it has to be integrated with our loc data-type and the URIResolverRegistry.

Asynchrounous pre-loading is probably based on a lambda or an interface which we can implement and link to a compiler or interpreted Rascal function. However, for interpretation I foresee lots of issues :-) of course.

timestamps for loc entries are definetely not in scope of Caffeine, but it seems a natural thing to add to @memo, unless we think a different tag would be better.

Dec 15 '21 10:12 jurgenvinju

rascal rascal copied to clipboard

Enhance @memo with configurability and effects of @benmanes/caffeine

rascal
rascal copied to clipboard