Memoize overload with number of items to cache
It would be nice to have a Memoize operator that only caches a given number of items. This would be handy for inspecting the start of a non-reentrant IEnumerable (e.g. checking if it is empty).
What's a non-reentrant IEnumerable?
An IEnumerable whose underlying data source is a stream that can only be called once.
An
IEnumerablewhose underlying data source is a stream that can only be called once.
In that case I'd argue that IEnumerable<> isn't the correct representation. It should be an IEnumerator<>. See following comments for a previous discussion about this:
- https://github.com/morelinq/MoreLINQ/issues/291#issuecomment-300809626
- https://github.com/morelinq/MoreLINQ/issues/291#issuecomment-300847539
Fair. I was also thinking of a case where enumeration is just expensive (e.g. an IEnumerable built from a paged API.)
I was also thinking of a case where enumeration is just expensive
That's exactly what Memoize is for. An empty check via Any should only cause a single item to be cached at worst.
Fair
Does that mean we close this?
Can't remember why I didn't think Memoize would work... but it clearly can. So sure.
Ah, I remember now. Suppose this IEnumerable is re-entrant, but very large (e.g. 100K results from a paged API). I don't want to have to requery for the first batch, so I use Memoize and then Any. But then I'm stuck keeping all 100K results in memory now, when I really only needed to keep at most one.
Of course, that leaves the question of how to make the trailing IEnumerable re-entrant. I'd have to be some sort of wrapper around originalEnumerable.Skip(memoizationCount), at least for subsequent calls... which could get super messy.
I'm stuck keeping all 100K
Nope. Any consume one element max, so Memoize will only cache one element.
I recognise this is an old thread, but I think the following code illustrates the example
var largeDatabaseQuery = context.GetItems().Memoize();
if (largeDatabaseQuery.Any()) //Check to see if we need to do any processing
{
//Do some expensive prep-work here
foreach (var item in largeDatabaseQuery)
{
//Do some processing
}
}
Memoizeis used so that we don't hit the database twiceAnyis called because there's some prep-work that we don't want to do if there weren't any items to process- Because we've used Memoize, the whole result set will remain in memory until the
largeDatabaseQueryfalls out of scope. WithoutMemoizeeach item could be garbage collected as soon as it was finished in the foreach loop
What this issue is asking for is a way to limit how many items are cached (eg in this instance 1 would be all we'd need).
@MatthewSteeples And using an enumerator instead so the source isn't iterated twice wouldn't suffice?
var largeDatabaseQuery = context.GetItems();
using var items = largeDatabaseQuery.GetEnumerator();
if (items.MoveNext())
{
// Do some expensive pre-work here
do
{
// Do some processing
}
while (items.MoveNext());
}
It would for that scenario yes, but that was just to simplify the point.
If your "do some processing" involves passing the IEnumerable off to somewhere outside of your control, or you want to make use of linq then you'd need to wrap the IEnumerator in an IEnumerable (like https://stackoverflow.com/questions/1029612/is-there-a-built-in-way-to-convert-ienumerator-to-ienumerable). That's how we currently deal with this. It's fraught with all forms of runtime risk (which are the same as this suggestion) if you try and evaluate it multiple times.