pingora icon indicating copy to clipboard operation
pingora copied to clipboard

tinyufo doesn't handle hash collisions

Open trinity-1686a opened this issue 4 months ago • 0 comments

Describe the bug

TinyUFO may return a value when queried for a key that was never inserted when hash collides.

~~Pingora info~~ TinyUFO info

Please include the following information about your environment:

TinyUFO version: 0.3.0 Rust version: cargo 1.80.1 (376290515 2024-07-16) Operating system version: Archlinux

Steps to reproduce

run

use std::thread;

use tinyufo::TinyUfo;

fn main() {
    // increase this to increase the chance of quick error, at the cost of memory. These parameters use roughly 17GB
    let item_count: u32 = 1<<28;
    let thread = 16;
    let end = item_count / thread;

    let cache = TinyUfo::new_compact(2*item_count as usize, 1024);
    let cache_ref = &cache;

    thread::scope(|s| {
        for t in 0..thread {
            s.spawn(move || {
                for i in 0..end {
                    let elem = i * thread + t;
                    assert!(cache_ref.get(&elem).is_none());
                    cache_ref.force_put(elem, elem, 1);
                }
            });
        }
    });
    std::mem::forget(cache);
    println!("success");
}

after a few hundred exections, i got this:

thread '<unnamed>' panicked at src/main.rs:18:21:
assertion failed: cache_ref.get(&elem).is_none()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at src/main.rs:13:5:
a scoped thread panicked

Expected results

Each inserted item is unique, and we check if it exists just before we insert it. It shouldn't ever happen that the item is present

Observed results

with a low probability (that of finding a 64b hash collision), the program crashes.

Additional context

I understand there may be reasons to do things that way, and it could be considered a reasonable work around to store (Key, Value) when one needs to make sure the value that got returned is for the key that was asked for. If that's what is expected of the api consumer, then it should be clearly stated.

I went looking if this could have an impact on pingora. As far as I can tell, it doesn't. That said, while looking, i found that MemoryCache create a TintyUfo<u64, _> and hash the key itself, which means keys get hashed, and that hash get hashed a 2nd times when using that abstraction. MemoryCache is itself used inside RTCache, which doesn't appear to be used.

trinity-1686a avatar Oct 01 '24 10:10 trinity-1686a