MoreLINQ icon indicating copy to clipboard operation
MoreLINQ copied to clipboard

Memoize overload with number of items to cache

Open Arithmomaniac opened this issue 7 years ago • 12 comments

It would be nice to have a Memoize operator that only caches a given number of items. This would be handy for inspecting the start of a non-reentrant IEnumerable (e.g. checking if it is empty).

Arithmomaniac avatar Dec 07 '18 14:12 Arithmomaniac

What's a non-reentrant IEnumerable?

atifaziz avatar Dec 18 '18 17:12 atifaziz

An IEnumerable whose underlying data source is a stream that can only be called once.

Arithmomaniac avatar Dec 18 '18 17:12 Arithmomaniac

An IEnumerable whose underlying data source is a stream that can only be called once.

In that case I'd argue that IEnumerable<> isn't the correct representation. It should be an IEnumerator<>. See following comments for a previous discussion about this:

  • https://github.com/morelinq/MoreLINQ/issues/291#issuecomment-300809626
  • https://github.com/morelinq/MoreLINQ/issues/291#issuecomment-300847539

atifaziz avatar Dec 18 '18 17:12 atifaziz

Fair. I was also thinking of a case where enumeration is just expensive (e.g. an IEnumerable built from a paged API.)

Arithmomaniac avatar Dec 18 '18 17:12 Arithmomaniac

I was also thinking of a case where enumeration is just expensive

That's exactly what Memoize is for. An empty check via Any should only cause a single item to be cached at worst.

Fair

Does that mean we close this?

atifaziz avatar Dec 18 '18 17:12 atifaziz

Can't remember why I didn't think Memoize would work... but it clearly can. So sure.

Arithmomaniac avatar Dec 18 '18 17:12 Arithmomaniac

Ah, I remember now. Suppose this IEnumerable is re-entrant, but very large (e.g. 100K results from a paged API). I don't want to have to requery for the first batch, so I use Memoize and then Any. But then I'm stuck keeping all 100K results in memory now, when I really only needed to keep at most one.

Arithmomaniac avatar Jan 06 '19 03:01 Arithmomaniac

Of course, that leaves the question of how to make the trailing IEnumerable re-entrant. I'd have to be some sort of wrapper around originalEnumerable.Skip(memoizationCount), at least for subsequent calls... which could get super messy.

Arithmomaniac avatar Jan 06 '19 16:01 Arithmomaniac

I'm stuck keeping all 100K

Nope. Any consume one element max, so Memoize will only cache one element.

Orace avatar Oct 30 '19 11:10 Orace

I recognise this is an old thread, but I think the following code illustrates the example

var largeDatabaseQuery = context.GetItems().Memoize();

if (largeDatabaseQuery.Any()) //Check to see if we need to do any processing
{
   //Do some expensive prep-work here

   foreach (var item in largeDatabaseQuery)
   {
      //Do some processing
   }
}
  • Memoize is used so that we don't hit the database twice
  • Any is called because there's some prep-work that we don't want to do if there weren't any items to process
  • Because we've used Memoize, the whole result set will remain in memory until the largeDatabaseQuery falls out of scope. Without Memoize each item could be garbage collected as soon as it was finished in the foreach loop

What this issue is asking for is a way to limit how many items are cached (eg in this instance 1 would be all we'd need).

MatthewSteeples avatar Apr 29 '24 09:04 MatthewSteeples

@MatthewSteeples And using an enumerator instead so the source isn't iterated twice wouldn't suffice?

var largeDatabaseQuery = context.GetItems();

using var items = largeDatabaseQuery.GetEnumerator();
if (items.MoveNext())
{
    // Do some expensive pre-work here
    do
    {
        // Do some processing
    }
    while (items.MoveNext());
}

atifaziz avatar Apr 29 '24 11:04 atifaziz

It would for that scenario yes, but that was just to simplify the point.

If your "do some processing" involves passing the IEnumerable off to somewhere outside of your control, or you want to make use of linq then you'd need to wrap the IEnumerator in an IEnumerable (like https://stackoverflow.com/questions/1029612/is-there-a-built-in-way-to-convert-ienumerator-to-ienumerable). That's how we currently deal with this. It's fraught with all forms of runtime risk (which are the same as this suggestion) if you try and evaluate it multiple times.

MatthewSteeples avatar Apr 29 '24 16:04 MatthewSteeples