Confusing behavior of caching mechanism in 4.1.2 for parametrized selectors
The call signature of equalityCheck is hard to use and there are excessive calls to equalityCheck.
In the following example, think of arg1 as state and arg2 as some parameter, I've just kept it general for brevity.
import { createSelectorCreator, defaultMemoize } from 'reselect'
const createFooSelector = createSelectorCreator(defaultMemoize, {
// How do we know if `a` and `b` represent arg1 or arg2? I think we should get
// all args at once in justs one call, so that both `a` and `b` are on the form [arg1, arg2].
equalityCheck: (a, b) => {
const equal = a === b
console.log('equalityCheck', { a, b, equal })
return equal
},
maxSize: 1000
})
const selectFoo = createFooSelector(
[
(arg1, arg2) => arg1,
(arg1, arg2) => arg2,
],
(arg1, arg2) => {
console.log('selectFoo', { arg1, arg2 })
return 0
}
)
selectFoo('arg1', 'arg2:1')
selectFoo('arg1', 'arg2:2')
selectFoo('arg1', 'arg2:3')
console.log('\nThis will not trigger recompute (but excessive equalityChecks):')
selectFoo('arg1', 'arg2:1')
Outputs
selectFoo { arg1: 'arg1', arg2: 'arg2:1' }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:2', b: 'arg2:1', equal: false }
selectFoo { arg1: 'arg1', arg2: 'arg2:2' }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:2', b: 'arg2:1', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:2', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:1', equal: false }
selectFoo { arg1: 'arg1', arg2: 'arg2:3' }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:2', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:1', equal: false }
This will not trigger recompute (but excessive equalityChecks):
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:1', b: 'arg2:3', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:1', b: 'arg2:2', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:1', b: 'arg2:1', equal: true }
It looks like the caching mechanism is doing a search. It doesn't feel like a great caching mechanism - we must be missing something? We will revert back to re-reselect for parameterized selectors case for now. Are we using reselect incorrectly or is this a bug?
Can you clarify what you mean by "call signature is hard to use"? Equality functions are meant to be trivial: "are these two things considered the same, yes or no". It's irrelevant which argument is first and which is second - they should be considered equal or not regardless.
Internally, the equalityCheck function will get called multiple times:
- Outer level: once for each input selector that was passed in
- Inside of
defaultMemoizewith a cache size > 1: once for each cache entry that is stored (a standard.findIndex()loop)
You can see the LRU cache implementation here:
https://github.com/reduxjs/reselect/blob/v4.1.2/src/defaultMemoize.ts#L21-L25
If you've got suggestions for better options we're certainly interested, but this straight out of the implementation of https://github.com/erikras/lru-memoize, and that library consistently ranked reasonably well in benchmarks of caching libraries when I was researching how to add a larger cache to Reselect.
So, no, I don't think this is a bug as far as I can tell. Are you seeing some kind of actual performance problems, or are you just wondering how it behaves in general? At first glance this feels like worrying over perf when there isn't actually a problem.
Thanks for clarifying. The use case I’m considering is sharing memoization of parametrized selectors across components. re-reselect solves this with cache keys which gives constant lookups.
How would one achieve the same behavior as re-reselect using reselect 4.1? This is what I mean with the call signature; just like with re-reselect I would like to cherry pick out the cache key in reselect and have the cache check equality on that (and of courser always check reference equlity on the first argument that is the state).
I understand the cache has to compare across all cache entries since the user defines a comparator and not a cache key, resulting in O(N) instead of O(1). Perhaps not a big deal. But I encountered this when I was experimenting with deep equal equality checks like in some of the examples in the readme. And in this case the additional equality checks become a significant performance impact. One has to use it very sparingly.
Perhaps the taleaway from this should be that I don’t understand how to achieve the same functionality as re-reselect. Perhaps this is different by design and I’m missing something. But the answer here is now ”Yes” but I don’t see how so perhaps I’m missing something.
I don't think you're "missing" anything. Setting maxSize > 1 is the rough equivalent in behavior. But, as you've noted, there are differences in how the two libraries implement that behavior.
Can you give an example of why you need to do deep comparisons? What's the use case? While that's always been possible given that you can pass your own equality function, Reselect was specifically built around the intent that comparisons are done either by reference or shallow checks, since those are fast and cheap.
We actually don't need to do deep comparison. That was just me experimenting. The need we have is precisely that which re-reselect caters too. I.e. sharing cache based on cache key for parametrized selectors across multiple components.
Now that I have a better understanding of the difference between re-reselect and reselect 4.1, and that reselect 4.1 doesn't cover the exact same functionality, I feel my confusion has been resolved. The root cause was an (incorrect) belief that reselect 4.1 would replace re-reselect.
I'd suggest to clarify this in the documentation.
Also, before finally dropping the questions around excessive calls to checkEquality, should the equality directly after the line selectFoo { arg1: 'arg1', arg2: 'arg2:3' } really be there? What is their purpose? It seems the same checks happen both before and after the selector was called. That is, these four lines:
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:2', equal: false }
equalityCheck { a: 'arg1', b: 'arg1', equal: true }
equalityCheck { a: 'arg2:3', b: 'arg2:1', equal: false }
I'm actually a bit confused myself. I haven't fully gone through all of re-reselect's docs, but the intent here is that the maxSize option can replace it for the common use case of "I want to share one selector instance between many components that will pass in different parameters", because now that one selector instance will keep cached values for multiple different input sets. Yes, the implementation is different in that re-reselect uses a "cache key extraction" approach, but the use case should be basically the same here.
Are there other aspects of re-reselect that the maxSize behavior does not cover, or that the O(n) cache lookup implementation is unsuitable for?
The comparisons that you're seeing are because Reselect's original logic does a double-memoization. It memoizes both an outer function that accepts the overall arguments, and an inner function that accepts the results of the input selectors:
https://github.com/reduxjs/reselect/blob/v4.1.2/src/index.ts#L105-L128
Both functions are memoized using the provided memoizer, although only one gets the additional customization arguments atm (see other open issues on that topic). Tbh I'm not entirely sure why this is necessary - that code has been there for a long time, well before I started actively working on this codebase.
So, I believe two of those comparisons that you're seeing are from the memoize() call on line 115, as we check the actual arguments being passed in. The other two comparisons are from the memoize() call on line 105, where we have confirmed that "yes, the extracted inputs really changed", and now it's time to calculate a new result.