CacheManager icon indicating copy to clipboard operation
CacheManager copied to clipboard

Feature request: ability to get the list of keys, or a subset of them.

Open mjharwood opened this issue 7 years ago • 7 comments

I needed the ability to remove all keys matching particular patterns, where one key might match more than one pattern, so regions would not suffice.

The work around is to have OnAdd/OnRemove hooks that keep track of all the keys, but this is inefficient as many of the backends have clear ways to get the keys (in memory and redis are the ones I'm using for the project).

mjharwood avatar Jun 09 '17 17:06 mjharwood

I totally get that this is kinda a great feature. The problem is though, distributed cache systems are not meant to be queried. Most of them don't support it at all or have n+1 issues like Redis' Keys command.

What I would suggest is, how about building that feature for in-memory cache only? For in-memory, this is actually relatively easy to control and do and fast.

So, how about make that feature available client side only and in case you have a distributed layer, you'll have to either not use the feature or build your own solution.

Maybe there will be solution for Redis some day with all the new features coming in 4.0, but right now, I really don't want to have any method calling keys in this library. Also, you call keys only for one server. This will not work in a cluster for example.

What do you think?

MichaCo avatar Jun 09 '17 18:06 MichaCo

The library for redis (StackExchange.Redis) does use SCAN with a pattern match rather than KEYS if the redis server is new enough to support it (>= 2.8.0), so it's somewhat less intense.

We have two use cases:

  1. In memory + redis + redis backplane: as long as we use the same timeout for both layers, then in-memory should be a near-instantaneous copy of redis (at least for that region). So that would work except for the corner case where it hasn't propagated yet, but you have that same problem with SCAN/KEYS.
  2. Pure redis: choices here are either SCAN or hashes of keys (either one global, or one per pattern).

Your suggestion would work for case 1, and maybe that's more useful to others and thus worth implementing at the library level.

The "hashes of keys" solution is very similar to the way regions support is implemented (at least for the memory and redis cases), except on multiple dimensions rather than just the one.

Example: Key1:Foo=a:Bar=x Key2:Foo=a:Bar=y Key2:Foo=b:Bar=y

and we want to be able to either remove all where "Foo=a" or, at a different time, "Bar=y" (basically deleting on named property)

Maybe there's a better way to solve this particular case.

mjharwood avatar Jun 09 '17 19:06 mjharwood

I think the Redis use-case would always be very dependent on how you setup Redis and what you actually need. You could still use CacheManager for layered cache and backplane, pass in a ConnectionMultiplexer to the configuration and use that one directly for searching for keys. I think that would be a good compromise.

Storing another lookup in Redis for all keys is just not something I'd like to have. That's not performing very well as you would have to keep it somewhat in sync. There could be millions of keys eventually...

Yes this would be similar to the regions implementation with Redis. But I know that the list of keys, i keep for a region, might not be 100% accurate and it doesn't really matter in that case. Again, if you would store really tons of keys in one region, a clear region would also be pretty slow. But it is a trade off for that feature which I think is fine.

In addition to all that, only Redis has good support for searching at all. Memcached and other distributed caches don't support that at all. That's why I never added that feature in the first place ;)

But again, I think it would be fine to have that implemented for all the in-memory caches.

MichaCo avatar Jun 09 '17 20:06 MichaCo

From quiet observation, I think there's two features here:

  1. Ability to search or match on keys in the cache in the hopes of doing...
  2. Retrieving multiple cache items from a list of given keys (also requested in #186)

While i understand 1. being quite expensive in some distributed caches such as Redis, and non existent in others, it is largely seperate from 2.

Currently, there might be situations where i need to get ~100 items from the cache, and if these are in Redis, this could mean 100 requests. If there was a IEnumerable<T> GetMany<T>(IEnumerable<string> keys) on the cache, then atleast we could direct Redis to do a single MGET originating from a single request instead of 100 GET originating from 100 requests (please correct me if I'm wrong).

Worst case scenario, the distributed cache doesn't have a facility for getting multiple keys in one command, and cache manager has to do what the user is otherwise forced to do - iterate and get each key individually.

twgraham avatar Feb 01 '18 23:02 twgraham

@twgraham You are absolutely right that those are two different operations. That being said, there is not really any point for a GetMany implementation right now because a) I'm using hashes in Redis and there is no operation in Redis where I can get multiple fields from multiple hashes at the same time. All ops on hashes are for one key. b) With CacheManager you can use multiple layers, e.g. in-memory + Redis. Maybe some keys are already cached in-memory, doing a get would eventually return it from in-memory (much faster) or fall back to redis. This might be actually faster than any multi get against Redis (just depends on how the Cache is populated...) c) The other distributed caches don't even have the ability to retrieve n keys, would do a manual iterator anyways. And for everything else, other in-memory caches, doing a get in a loop is really fast and usually the only way to retrieve n keys anyways.

That's why this feature doesn't exist, I'd implement it as an iterator on the top level anyways. Then, a MGet might have draw backs, you have to handle what should happen if one or more given keys do not exist. I could return either nulls to keep positions or only return found key/values. Also, I'd have to return both, key and value. There might be more logic involved, so, I think it makes more sense for anyone who needs it, to implement it with a simple loop+ custom logic...

Regarding the "search" like feature, I'll look more into it again when I find some time. Still interested in getting that feature done. I'm just not very happy with the current approach in the related PR and might have to change it a lot ...

MichaCo avatar Feb 02 '18 00:02 MichaCo

Thanks for the explanation.

Regarding a) do you have any further reference material (code aside) about how cache manager uses Redis to store and retrieve values?

In regards to b) my concern is for cold starts, and expiration (although I suppose that's up to me to tune).

I suppose we can agree on c).

twgraham avatar Feb 02 '18 00:02 twgraham

Hi,

I think that storing cache keys would be very useful for tracing purposes. I have also read the related PR request.

It is obvious that due to memory limitations you cannot store O(n) keys in a hashtable/hashset in the BaseCacheManager. I made some calculations, complications would be catastrophic.

But I have suggestion about this: What about storing all keys in the BaseCacheManager with a tree like structure TRIE. It might have several advantages:

  • It can store all keys with the cost of log(n) of the key storage.
  • Keys space can be explorable in log(n) time from top to bottom like traversals.
  • Hierarchical key names can be supported. Instead of one level "region.key" structure. It could support "region1.region2.region3.key" structure where every dot represents children of nodes. That would lead very advanced key querying possibilities. Like .region2. etc.
  • Even very long key names could be stored efficiently.

The main problem is the key insertion time (log(n) I think on average) of the TRIE of course. Which can be done in a seperate thread right after sending the key to the handlers.

What do you think?

Regards,

gokhanercan avatar Jul 23 '19 08:07 gokhanercan