dragonfly feat(server): implement heavykeeper data structure. #257

Signed-off-by: Super-long [email protected]

This commit implements a hotspot identification data structure - HeavyKeeper, and here is some description of this implemented version.

heavy_keeper.h：

The basic version of the original paper using min-heap is the most memory-efficient solution. However, the processing speed is limited because the time complexity of updating and searching the key in the min-heap is O(log(k)) and O(k), respectively, which is time consuming. HeavyKeeper-DF implements Optimization III in the original paper, which uses a bi-directional chain table + hash table instead of min-heap, which not only makes the time complexity of both insertion and update is O(1), but also eliminates the older hotkeys, so that the latest hotkeys are always in HeavyKeeper.
User can configure hk_threshold, only keys with HK[i][j].count greater than hk_threshold will be considered as hotkey, this value should be an empirical value, so it is settable.
hk_length is set to 3079, and there are two considerations here: (1). hk_length should be set to a prime number in order to avoid hash conflicts as much as possible, here we refer to https://planetmath.org/goodhashtableprimes. (2). In the original paper when HeavyKeeper is estimated at 80kb-100kb is very close to the theoretical probability capacity bound, and when hk_length is set to 3079 occupies about 80kb, the calculation here is like this: 12 * hk_wide * hk_length + LRUCache(topK).
HeavyKeeper-DF implements Optimization I/II from the original paper, and the code comments have references to both.

heavy_keeper_test.cc：

The basic functionality and performance of HeavyKeeper-DF was tested in Basic Test. It should be noted that HeavyKeeper is a probabilistic data structure, and the judgment condition is set to EXPECT_GE(hk_sum, hotkey_lens-1) for the stability of the unit test.
A simple test of HeavyKeeper-DF's hotspot elimination capability in the HotkeyEliminationTest test.

Others:

The current insertion rate of HeavyKeeper-DF is about 180 nanoseconds per request, which seems to satisfy the "fast insertion", and these numbers are easily reproduced in the Basic test, and gperftools is also included in the unit test by executing "pprof . /heavy_keeper_test test.prof --pdf > test.pdf" to see the performance bottleneck of HeavyKeeper-DF. My CPU model name is : Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz. here are the results of the unit test running on my machine：
Currently the destructor of LRU Cache is not implemented, and the Node allocation in LRU uses new/delete (not a bottleneck), perhaps a better approach is a two-way array list
This PR is mainly to verify that such an implementation is feasible, and to simplify the implementation, std::string is used for the parameters of HeavyKeeper.Insert. when the feasibility is determined, I will continue to adapt dragonfly's key.

My implementation is partially referenced at: https://github.com/papergitkeeper/heavy-keeper-project .

Thanks to these clever people for creating this interesting data structure!

Oct 29 '22 19:10 Super-long

One interesting thing, from the perf results, the performance bottleneck of HeavyKeeper-DF is focused on find operation, because ankerl::unordered_dense::map is efficient in find(string) operation, I replaced std::unordered_map in LRUCache with ankerl::unordered_dense::map, and got about 45% performance improvement, the following is the result of unit test: 截屏2022-10-30 03 39 24

Although ankerl::unordered_dense::map is an excellent open source project, we must consider stability, which is a tradeoff between performance and stability, so whether to use ankerl::unordered_dense::map I think it still needs to be discussed by the community members.

Oct 29 '22 19:10 Super-long

Thanks for writing this summary. I will provide general comments here about the code style here:

Avoid using flags (ABSL_FLAG) for low-level data-structures. You can introduce Options struct or pass parameters directly in a c'tor instead.
Use cc file for the implementation and for internal data-structures. Header files should be stay as clean as possible.
Please comment the algo and data structures.
Unit tests are for correctness testing. For performance benchmarks use BENCHMARK fixtures. See https://github.com/dragonflydb/dragonfly/blob/082ac36ac195b5fd503422db7152c2d6efbd857a/src/core/dash_test.cc#L1081 for example. You also do not have to manually control profiler. You can run the test with CPUPROFILE=/tmp/foo.prof ./heavy_keeper_test and it should work.
I usually use absl::flat_hash_map as my default hashmap type. it's pretty efficient.

Oct 30 '22 06:10 romange

I do not remember the paper now (need to reread it) but what if we won't have lruCache/topK at all? will this affect the correctness of Insert method?

Oct 30 '22 07:10 romange

Okay, I will fix the code style issue next week.

lruCache： As far as I understand, the original paper has very little description of the heavykeeper implementation without min-heap (Optimization III: Speed Acceleration.), so there isn't a good example to guide me on how to do it.

The original paper mentions using arrays instead of min-heap to save time, but this has the problem that ：

in order to implement Optimization I/II we need to know quickly if a key exists in an array,
In the original version, min-heap records the topk of count in all keys, and its will always eliminate the element with the smallest count in min-heap. In the array version, only keys with count greater than a threshold are added to the array, and it lacks a good elimination strategy.

These two points make me think that LRU with Insert/Find time complexity of O(1) is a good choice.

topK： topK is necessary. Because the definition of HeavyKeeper is : An Accurate Algorithm for Finding Top-k Elephant Flows

Oct 30 '22 10:10 Super-long

@Super-long I totally get why topK is necessary when implementing the original article. I am saying that from product perspective, this feature is going to be used for debugging, imho, therefore we have the prerogative to do a brute force sort+topK for the whole dataset. As long as the dataset is not huge (i.e. less than millions) the cost of sort during query time is not very high. And I am not saying we should choose this option but I want to understand if this option is feasible, i.e. topK is required only for querying and not needed for updates.

Oct 30 '22 10:10 romange

I actually think it's a great idea, but really it's two different directions.

dynamically maintaining LRU/topK allows us to know the topk information in real time, so we can do automated hotspot identification at a small cost and perform subsequent actions to optimize the hotspot or want to monitor real-time hotspot information at the second level.
It is perfectly feasible not to dynamically maintain LRU/topK, because when a key is a hotkey it does always reside in HeavyKeeper, and the length of HeavyKeeper is currently set to 3079, and full sorting is not a big overhead, so if we only want to call the topk interface a few times a day, obviously we don't need to maintain LRU/topK.

So I think we should think about what kind of hotkey capability we need and what kind of interface we want hotkey to have.

Oct 30 '22 11:10 Super-long

I think I can do some research in the next few weeks to see what the current interfaces are for hotspots from the major public cloud vendors.

Oct 30 '22 11:10 Super-long

I suspect we are pushing boundaries here but go ahead - would love to hear what you learn!

Oct 30 '22 11:10 romange

Code style related issues were fixed in the latest commit. The next major task is to think about what kind of hotspot interface is needed in dragonfly, which is related to whether we need topk or not.

The current implementation of LRU is relatively simple and even uses bare pointers for a fast implementation, but this implementation is memory leak free.

Execute valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all . /heavy_keeper_test reproduces the following result：截屏2022-11-05 14 08 25

Nov 05 '22 06:11 Super-long

I'll be submitting a PR soon to fix the code changes mentioned above.

I've been thinking about how to use heavykeeper in dragonfly for a while now, and here are three issues I'd like to address.

some drawbacks of heavykeeper as a data structure.
the way other public cloud vendors handle hotspots.
how we should use heavykeeper results.

Nov 12 '22 14:11 Super-long

some drawbacks of heavykeeper as a data structure.

a. The value of hk_b in exponential-weakening decay is 1.08, which is referred to the example in the paper, but it is not necessarily suitable for drangonfly, because the decreasing probability is already very low when the count is equal to about 50, which means that when there is a conflict, often a key that is an order of magnitude larger than the previous hotspot can offset the previous key. So hk_b needs to be modified based on the qps of different machines.

b. heavykeeper is used to identify topk information over a period of time, the problem is that a period of time is not quantifiable, for example, we want to perceive hotspot information within 5s of seconds, the original heavykeeper structure is not available.

Nov 12 '22 14:11 Super-long

other public cloud vendors handle hotspots.

Take the hotspots handling capability of public cloud vendors that support redis protocol as an example, full metrics detection is supported in AWS ElastiCache, Azure Cache for Redis, google memorydb, and Redis Lab, where you can see real-time hotspot information.

In Alibaba Cloud, it supports real-time hotkey statistics, offline full-volume key analysis, and hotspot optimization through Proxy Query Cache; in tencent Cloud, it supports 5/10/15 seconds hotspot analysis and historical hotkey analysis.

In larger capacity Nosql/relational database, hotspot information is usually used for split point selection.

Nov 12 '22 14:11 Super-long

how we should use heavykeeper results. So, I think what we need is real-time hotspot information display and some access optimization based on hotspot information. The former can support hotspot statistics at 5s level and print the output in a separate log file; the latter can think about what optimizations we can do after getting the hotspot information, such as data replacement in the dash to reduce the number of searches, or some kind of cache, which requires a deeper understanding of the code.

Nov 12 '22 14:11 Super-long

To summarize, a complete hotspot identification module may require at least three PRs.

support the basic heavykeeper structure.
support for hotspot identification, i.e., the logs show real-time hotspot information within 5s (configurable) (docked to ELK or Prometheus).
heavykeeper can sense real-time hotkey and how to optimize user access based on this information.

Nov 12 '22 14:11 Super-long

@romange One question is do we need to use heavykeeper to support hotspot identification within 5s? The basic process is as follows.

heavykeeper is used for hotspot identification within 5s
print topk to log
clear heavykeeper
.....

There is an extreme case here, for example, the identification period of heavykeeper is [0s,5s][5s,10s], there may be a hotspot duration in [4s,7s], the general case of hotspot access is much larger than other keys, so in [0s,5s][5s,10s] this key can generally be identified as a hotspot.

Nov 12 '22 14:11 Super-long

@Super-long Do you mind if i take over this PR? I will keep the high-level design as is, but will adapt the code to our style guide and do other modifications.

Feb 17 '23 07:02 romange

dragonfly dragonfly copied to clipboard

feat(server): implement heavykeeper data structure. #257

dragonfly
dragonfly copied to clipboard