pingora
pingora copied to clipboard
tinyufo doesn't handle hash collisions
Describe the bug
TinyUFO may return a value when queried for a key that was never inserted when hash collides.
~~Pingora info~~ TinyUFO info
Please include the following information about your environment:
TinyUFO version: 0.3.0 Rust version: cargo 1.80.1 (376290515 2024-07-16) Operating system version: Archlinux
Steps to reproduce
run
use std::thread;
use tinyufo::TinyUfo;
fn main() {
// increase this to increase the chance of quick error, at the cost of memory. These parameters use roughly 17GB
let item_count: u32 = 1<<28;
let thread = 16;
let end = item_count / thread;
let cache = TinyUfo::new_compact(2*item_count as usize, 1024);
let cache_ref = &cache;
thread::scope(|s| {
for t in 0..thread {
s.spawn(move || {
for i in 0..end {
let elem = i * thread + t;
assert!(cache_ref.get(&elem).is_none());
cache_ref.force_put(elem, elem, 1);
}
});
}
});
std::mem::forget(cache);
println!("success");
}
after a few hundred exections, i got this:
thread '<unnamed>' panicked at src/main.rs:18:21:
assertion failed: cache_ref.get(&elem).is_none()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at src/main.rs:13:5:
a scoped thread panicked
Expected results
Each inserted item is unique, and we check if it exists just before we insert it. It shouldn't ever happen that the item is present
Observed results
with a low probability (that of finding a 64b hash collision), the program crashes.
Additional context
I understand there may be reasons to do things that way, and it could be considered a reasonable work around to store (Key, Value)
when one needs to make sure the value that got returned is for the key that was asked for. If that's what is expected of the api consumer, then it should be clearly stated.
I went looking if this could have an impact on pingora. As far as I can tell, it doesn't. That said, while looking, i found that MemoryCache
create a TintyUfo<u64, _>
and hash the key itself, which means keys get hashed, and that hash get hashed a 2nd times when using that abstraction. MemoryCache
is itself used inside RTCache
, which doesn't appear to be used.