dragonfly
dragonfly copied to clipboard
implement cache testing tool
The tool should be able to read traces from https://github.com/twitter/cache-trace and send them to a redis endpoint.
the code should preferrably be structured in such way that we could easily add another trace format in the future.
The tool can probably be implemented in python since I guess we must send requests sequentially from a single connection anyway. Actually, I am not sure - the traces contain namespaces and if there are many of them, we could parallelize the flows and then golang would be a better choice - some preliminary investigation is needed. These traces are pretty large so I would appreciate if we reduce the test run time.
The tool should provide hit/miss statistics by periodically checking INFO response and providing the final report at the end.
- if we end up implementing the tool using golang, we should learn where to place it and where other multi-language projects put their golang code.
another thing, it could be nice if it could also send synthetic traffic, without any files, probably using incrby
command that will allow sending write-only traffic and still measure the hit rate.
Lets start with the following tasks
- Implement a tool in python that sends a traffic distributed using zipfian distribution. I am not an expert in statistics, but I know many papers use zipf for skewed traffic when testing cache with
alpha
< 1. For some reason, default python libs do not seem to provide zip generator that fit these requirements. See https://stackoverflow.com/questions/1366984/generate-random-numbers-distributed-by-zipf/8788662#8788662 and https://stackoverflow.com/questions/31027739/python-custom-zipf-number-generator-performing-poorly on how to work around this. - The tool should accept alpha and
N
and send Nincrby
requests to a redis-like memory store. (If the response is1
then you know it's a new key (miss), otherwise it's a hit). - The tool should provide a hit/miss summary after the run is completed. Bonus points - to provide intermediate hit-ratio stats during the run by using terminal control sequences 💯
- Once we know the tool work, we can implement hits/misses tracking in Dragonfly. check out
keyspace_hits
andkeyspace_misses
metric in server_family.cc (similarly to redis). As you can see these are not implemented yet. I would guess that the right place to insert this tracking is insideDbSlice::FindExt
function that is called by all other find functions. Obviously, hits/misses metrics should be equal to those that the tool counts. - Once we have hits/misses tracking working, we can add to the tool support for twitter cahe traces aforementioned above. (Those do not necessarily use
incrby
so this is why we must have server-side stats).
Eventually, we will be able to run zipf/real-world traces against DF and Redis and compare their caching performance for the same memory usage.
@romange i can take a jab at this!
Thanks, we welcome contributions to the project! 🙏 Please implement items 1-3. we are interested to send zipfian distribution of keys [key:0 - key:N] like I mentioned in the issue. Here is java reference https://github.com/apavlo/h-store/blob/e49885293bf32dad701cb08a3394719d4f844a64/src/benchmarks/edu/brown/benchmark/ycsb/distributions/ZipfianGenerator.java#L41 but I am sure it's possible to find/copy python based implementations as well. And please ignore that cache-trace task.
@romange looked through some papers using Zipf for Cache-related work, did you mean to say alpha
< 1, not alpha
< 0?
Yes, alpha less than 1
Hi @romange I have created a PR (#640), don't know why I can't seem to link it to this issue, perhaps because I'm not an assignee. Feel free to take a look whenever you get the chance!