valkey Make KEYS to be an exact match if there is no pattern

Although KEYS is a dangerous command and we recommend people to avoid using it, some people who are not familiar with it still using it, and even use KEYS with no pattern at all.

Once KEYS is using with no pattern, we can convert it to an exact match to avoid iterating over all data.

Jul 16 '24 06:07 enjoy-binbin

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 70.28%. Comparing base (1a8bd04) to head (330fe53). Report is 89 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #792      +/-   ##
============================================
+ Coverage     70.25%   70.28%   +0.03%     
============================================
  Files           112      112              
  Lines         60590    60607      +17     
============================================
+ Hits          42567    42600      +33     
+ Misses        18023    18007      -16

Files	Coverage Δ
src/db.c	`89.06% <100.00%> (+0.67%)`	:arrow_up:

... and 13 files with indirect coverage changes

Jul 16 '24 06:07 codecov[bot]

Maybe we can extend this more common, for something like KEYS h[ae]llo , we can translate it to exactly match as hallo and hello. https://github.com/redis/redis/pull/13079

Jul 16 '24 07:07 zhulipeng

Maybe we can extend this more common, for something like KEYS h[ae]llo , we can translate it to exactly match as hallo and hello

it seems like a good idea in some engine, but i think it adds too much code, and it doesn't seem worth doing so much pre-optimization in here.

Jul 16 '24 07:07 enjoy-binbin

We discussed this issue when implementing the one-dict-per-slot feature, and we considered mapping the pattern for commands like SCAN to slots using hashtags. However, when it comes to determining the key for the pattern, I used the behavior of the EXISTS command as an example of what to avoid https://github.com/redis/redis/pull/12536#issuecomment-1788692863.

We implemented this feature on our platform before (as early as version 2.8), and it resulted in very undesirable outcomes. Specifically, the time complexity became ambiguous, fluctuating between O(1) and O(n), which is a significant difference.

For instance, if a user initially used a non-pattern-based pattern like KEYS abc, they would quickly get results. However, when they used a pattern like xyz*, the overall result would be very poor, causing instances to get stuck due to slow queries and even leading to system crashes in the production environment. This was a real-life scenario.

I believe the right thing to do is to persuade users to use EXISTS instead of KEYS, to do the right thing the right way, rather than accommodating this incorrect usage. We should revert this "optimization" @valkey-io/core-team

Aug 28 '24 12:08 soloestoy

ok, i did not read https://github.com/redis/redis/pull/12536 at that time so i missed this context, sorry about that, and sorry i am not realizing it

We implemented this feature on our platform before (as early as version 2.8), and it resulted in very undesirable outcomes. Specifically, the time complexity became ambiguous, fluctuating between O(1) and O(n), which is a significant difference.

so you guys revert it at the end? i can see it will cause trouble, a guy execute very fast in here, and then get blocked when executing somewhere else. But i think in this case, it is ok if we document it (or we can even not to document it), O(N) degenerates to O(1) which is OK in my opinion, the worst case is O(N) and we did not deny it, and if he get blocked at the end, it is also normal since it is a wrong case.

to do the right thing the right way

i agree with this actually, i hope we can do it. But there are sill a lot of user doing this in the wrong way, they did not bother to do the right thing and everytime they are blaming the "service", all we can do i think is at least try our best to migrate the impact with a small cost. But i do agree with this, i am ok with both (revert it or document it or just leave it alone), lets seeks a vote and then i can process it.

And speaking of this, i would also like ask everyone to discuss this one: https://github.com/valkey-io/valkey/pull/866#issuecomment-2290542519. What can or what should we do to migrate this. If we agree with something, i think we can fit it into 8.0

Aug 28 '24 13:08 enjoy-binbin

When I develop a new feature or optimization, I often remind myself that Redis and Valkey are products designed for online services.

As an engineer at a cloud provider, with my experience facing various different scenarios, I have found that online services and offline services are quite different. The characteristic of online services is that they require a high level of determinism, both in terms of execution results and performance, because online services can change at any time due to user actions at the front end, while the deployed code for online services remains fixed.

Therefore, uncertainty can potentially bring disaster to the business. Take the KEYS command as an example - in my view, if a general optimization can reduce the time complexity from O(n) to O(log n), it would be very meaningful. However, if the time complexity fluctuates between O(n) and O(1), this kind of uncertainty is very dangerous.

Aug 28 '24 15:08 soloestoy