hashbrown feat: Add filter to HashSet/Table/Map

With the removal of the raw table, it is hard to implement an efficient loop to conditionally remove/keep certain fields up to a limit. i.e. a loop that can be aborted and does not require rehash the key for removal of the entry.

Nov 14 '24 04:11 tugtugtug

not sure if the name makes sense... maybe prune is better if I reverse the removal logic?

Nov 14 '24 15:11 tugtugtug

You can already do this with extract_if, because dropping the iterator will keep the remainder.

    let mut removed = 0;
    for _ in map.extract_if(|&k, _| k % 2 != 0) {
        removed += 1;
        if removed == 3 {
            break;
        }
    }

Nov 14 '24 17:11 cuviper

extract_if

Thanks @cuviper , does this still loop over all hash entries tho? If the intention is to loop over all entries, then the exisiting retain works as well, as one's predicate can just stop returning false, once it returned enough trues.

Nov 14 '24 17:11 tugtugtug

extract_if
Thanks @cuviper , does this still loop over all hash entries tho? If the intention is to loop over all entries, then the exisiting retain works as well, as one's predicate can just stop returning false, once it returned enough trues.

Nvm, I see how it works now, it certainly did not catch my attention how it works :), thought it works as retain, maybe I'll add an example to the documentaion of extract_if to capture this use case.

Nov 14 '24 18:11 tugtugtug

I suppose a limitation compared to your proposal is that you can only stop extract_if on the boundary of an extracted item, after each Iterator::next(). So extract_if can't shortcut something like "on this condition, keep this item and everything after."

Nov 14 '24 18:11 cuviper

I suppose a limitation compared to your proposal is that you can only stop extract_if on the boundary of an extracted item, after each Iterator::next(). So extract_if can't shortcut something like "on this condition, keep this item and everything after."

Yeah, I was trying to write a test with my use case, and found it awkward if I would like to break out early before the next matched entry.

Nov 14 '24 18:11 tugtugtug

@cuviper do you think then the addition of the API still serves a purpose, the main use case for the new api is to limit the # of iterations per access to the hashmap, and extract_if certainly does not serve that purpose well as discussed.

Nov 14 '24 18:11 tugtugtug

As a workaround, you could extract that final item to terminate the loop, and then re-insert it. Even with rehashing, that may be faster than continuing to iterate the rest of the map.

Otherwise, yes I do think this API can serve a unique purpose, but Option<bool> is kind of awkward to use, and I'm not sure the use case (in that gap of extract_if) is common enough to warrant the addition.

If we do add something -- it would be slightly more powerful returning ControlFlow<bool, bool>, so you have explicit Continue and Break signals, and the ability to keep/erase in either case.

Nov 14 '24 18:11 cuviper

As a workaround, you could extract that final item to terminate the loop, and then re-insert it. Even with rehashing, that may be faster than continuing to iterate the rest of the map.

Yes, that's what I'm trying to write a test for, and I really find it awkward, it ends up like this.

	let done = AtomicBool::new(false);
	let mut count = 0;
	for item in map
		.extract_if(|k, v| -> bool {
			count += 1;
			if count == limit {
				done.store(true, std::sync::atomic::Ordering::Relaxed);
				return true;
			}
			predicate(k, v)
		})
	{
		if done.load(std::sync::atomic::Ordering::Relaxed) {
			map.insert(item.0, item.1);
			break;
		}
	}

Otherwise, yes I do think this API can serve a unique purpose, but Option<bool> is kind of awkward to use, and I'm not sure the use case (in that gap of extract_if) is common enough to warrant the addition.

If we do add something -- it would be slightly more powerful returning ControlFlow<bool, bool>, so you have explicit Continue and Break signals, and the ability to keep/erase in either case.

I did debate whether Option is the right approach, just wanted something lite weight to pass the stop signal. But yes, I can see it can be more powerful with an explicit flow control. I can work towards this if that's something you guys would more likely to accept.

Nov 14 '24 18:11 tugtugtug

Now renamed the function to filter (by lack of a better name) with ControlFlow<bool, bool> as the return value, also beefed up documentation to explain the behaviors of the return combos.

Nov 14 '24 21:11 tugtugtug

@cuviper any feedback with the current PR? As mentioned, I'll work towards any suggestion that would improve this use case.

Nov 19 '24 17:11 tugtugtug

I don't love the name filter, but I don't have a good suggestion right now. Otherwise, the shape of this seems pretty good. Ultimately @Amanieu should decide on new hashbrown APIs though.

Nov 19 '24 17:11 cuviper

I don't love the name filter, but I don't have a good suggestion right now. Otherwise, the shape of this seems pretty good. Ultimately @Amanieu should decide on new hashbrown APIs though.

Yeah, same, I'd be open to any suggestions with the naming. I couldn't come up with anything better, my candidates are, prune, filter, retain_with_control.

Nov 19 '24 17:11 tugtugtug

Can't you implement the same functionality by just using a skip_rest flag in your closure? Just set that flag to true when you would return Break and then cause the closure to return true for all remaining elements? The compiler should optimize that into a break (do check this though).

Nov 22 '24 10:11 Amanieu

Thank you @Amanieu! I just tried it again, ~~and yes, if I just have a simple variable that controls the flow, then it does indeed get optimized by the compiler~~. However, if I have any sort of indirection, say if the condition is managed by another lambda or another function, then the compiler would not be able to optimize it out. Imagine the maintainance of the hashmap is based on the stress level of the resources. So the loop is not really based on a constant but rather some calculation. I've made a simple example showing that the compiler optimization is not reliable. The following example can be compiled with "inline-more" and release mode (with $ rustc --version rustc 1.79.0 (129f3b996 2024-06-10), but would still print "3" multiple times.

use hashbrown::HashMap;
fn main() {
    let mut map: HashMap<i32, i32> = (0..8).map(|x|(x, x*10)).collect();
    let mut counter = 0;
    let stop = |counter| {
        if counter == 3 {
            println!("{}", counter);
            return true;
        }
        return false;
    };
    map.retain(|&k, _| {
        if stop(counter) {
            return true;
        }
        if k % 2 == 0 {
            counter += 1;
            return true;
        }
        return false;
    });
    assert!(map.len() >= 3);
}

Nov 22 '24 16:11 tugtugtug

That print is an observable side effect -- the compiler cannot skip that by optimization.

Nov 22 '24 16:11 cuviper

That print is an observable side effect -- the compiler cannot skip that by optimization.

@cuviper After rerun I found even with the simplest test, it is still failing (so my previous comment wasn't correct):

use hashbrown::HashMap;
fn main() {
    let mut map: HashMap<i32, i32> = (0..8).map(|x|(x, x*10)).collect();
    let mut counter = 0;
    let mut counter_true = 0;
    map.retain(|&k, _| {
        if counter == 3 {
            counter_true += 1;
            return true;
        }
        if k % 2 == 0 {
            counter += 1;
            return true;
        }
        return false;
    });
    assert_eq!(counter_true, 1);
}

Nov 22 '24 16:11 tugtugtug

That print is an observable side effect -- the compiler cannot skip that by optimization.

@cuviper After rerun I found even with the simplest test, it is still failing (so my previous comment wasn't correct):
use hashbrown::HashMap;
fn main() {
    let mut map: HashMap<i32, i32> = (0..8).map(|x|(x, x*10)).collect();
    let mut counter = 0;
    let mut counter_true = 0;
    map.retain(|&k, _| {
        if counter == 3 {
            counter_true += 1;
            return true;
        }
        if k % 2 == 0 {
            counter += 1;
            return true;
        }
        return false;
    });
    assert_eq!(counter_true, 1);
}

I guess my point is, if one has to craft their code so carefully to make sure the loop ends when it should, it signals the need for this more explicit API, no? this also fails, now the exit condition has nothing in it:

use hashbrown::HashMap;
fn main() {
    let mut map: HashMap<i32, i32> = (0..8).map(|x|(x, x*10)).collect();
    let mut counter = 0;
    map.retain(|&k, _| {
        counter += 1;
        if counter >= 3 {
            return true;
        }
        if k % 2 == 0 {
            return true;
        }
        return false;
    });
    assert_eq!(counter, 3);
}

Nov 22 '24 16:11 tugtugtug

The values of counter and counter_true can't be changed by optimization either. The way to think about this is to see what the behavior would be without any optimization, and then it should behave the exact same way when optimized -- only faster. So if you're looking at those values after the loop ends, the compiler has to give you the same answer either way.

(That doesn't include explicit debug/release differences like debug-assertions and overflow-checks, which change the code that is produced even before optimization starts.)

If you want the optimizer to see that it can skip the rest of the loop, you should "break" in a way that doesn't modify anything that would be visible outside that loop.

I guess my point is, if one has to craft their code so carefully to make sure the loop ends when it should, it signals the need for this more explicit API, no?

Maybe, if that use case is deemed significant enough to warrant a larger API. I'm sure there are a lot of things we can imagine that would be easier to accomplish with some specialized API, but it doesn't mean we should add them all.

Nov 22 '24 16:11 cuviper

@cuviper thank you for your thoughtful comments. I understand your perspective, but I respectfully disagree that users of this library need to have in-depth knowledge of compiler internals to optimize their code. In my experience as shown above, achieving optimal performance can be quite challenging.

here are a lot of things we can imagine that would be easier to accomplish with some specialized API, but it doesn't mean we should add them all.

I feel this was misunderstood. My intention was not to propose an API that's completely new or unreasonable, but rather to share a specific use case that might not have been considered seriously previously, and it was achievable through the raw table apis that are now taken away. If removing the raw table API is in favor of the use of the hashbrown APIs (which is a good direction IMO), then I felt there is a gap that should be filled by this extra API.

I think I tried to provide realistic and relevant examples, and I'm open to reasonable workarounds that address the concerns.

Nov 22 '24 17:11 tugtugtug

I'm also not trying to dismiss your case, just offering a reason why we might not address it. Sometimes the answer can be "no" for other reasons too, like maintenance burden. Either way, this PR is not decided yet!

Nov 22 '24 17:11 cuviper

Just to finish the topic about the compiler optimization: This is the latest I tried, compiled with release mode:

use hashbrown::HashMap;
fn main() {
    let mut map: HashMap<i32, i32> = (0..8).map(|x|(x, x*10)).collect();
    let mut counter = 0;
    map.retain(|&k, _| {
        counter += 1;
        if counter >= 3 {
            return true;
        }
        if k % 2 == 0 {
            return true;
        }
        return false;
    });
}

This is the assembly from the counter +=1;:

inc    %edx                                                               
cmp    $0x2,%edx                                                          
jg     0x55555555cd68 <_ZN14test_hashbrown4main17h595336c6b17be939E+584>  
tzcnt  %r9d,%r10d                                                         
shl    $0x3,%r10d                                                         
mov    %r8,%r11                                                           
sub    %r10,%r11                                                          
mov    -0x8(%r11),%r10d                                                   
and    $0x1,%r10d                                                         
je     0x55555555cd68 <_ZN14test_hashbrown4main17h595336c6b17be939E+584>  
mov    %rdi,%r10                                                          
sub    %r11,%r10                                                          
sar    $0x3,%r10                                                          
lea    -0x10(%r10),%r11                                                   
and    %rsi,%r11                                                          
movdqu (%rdi,%r11,1),%xmm1                                                
pcmpeqb %xmm0,%xmm1                                                       
psllw  $0x7,%xmm1                                                         
pmovmskb %xmm1,%ebx                                                       
test   %ebx,%ebx                                                          
jne    0x55555555cd30 <_ZN14test_hashbrown4main17h595336c6b17be939E+528>

I'm not great at assembly code, but afaik, this does not early exit the loop afaik. Use GDB to log the count of the return true;, I also saw it hits more than 3 times.

@cuviper do you have more things to try? I could give it a couple of more iterations just to close off this topic.

Nov 22 '24 17:11 tugtugtug

I'm also not trying to dismiss your case, just offering a reason why we might not address it. Sometimes the answer can be "no" for other reasons too, like maintenance burden. Either way, this PR is not decided yet!

I did feel a bit discouraged as you could see. But again, I would be convinced if any reasonable workarounds are available to normal users like myself, or any suggestions to make this API more acceptable to be included. Having lost the access to the raw table already incurred extra maintenance to the users of the library afaik, but I support that as it would reduce the maintainence cost to this great library. And I understand the maintainers of the library has the final saying for whatever reasons. I'll accept either way, as I have tried my best.

Nov 22 '24 18:11 tugtugtug

There's also the fun fact that counter += 1 can in theory wrap around, so in the optimizer's eyes that may not always remain >=3. I tried counter = counter.saturating_add(1) though, and that didn't help.

Nov 22 '24 18:11 cuviper

There's also the fun fact that counter += 1 can in theory wrap around, so in the optimizer's eyes that may not always remain >=3. I tried counter = counter.saturating_add(1) though, and that didn't help.

True, I mean, we could keep trying, but I feel like this is probably beyond any reasonable effort from a normal user :)

Nov 22 '24 18:11 tugtugtug

Overall, I think this ties into the discussion we had a few days ago in the libs-api meeting about extract_if, specifically around desires for a more general version of this operation as shown in https://github.com/rust-lang/rust/issues/43244#issuecomment-2495135299. I think the best course of action here would be to design some form of cursor API that would work for both Vec and HashMap.

Nov 23 '24 00:11 Amanieu

Overall, I think this ties into the discussion we had a few days ago in the libs-api meeting about extract_if, specifically around desires for a more general version of this operation as shown in rust-lang/rust#43244 (comment). I think the best course of action here would be to design some form of cursor API that would work for both Vec and HashMap.

The cursor API looks interesting and promising for the general use cases, however implementation seems rather complex for what this api tries to fix, that is a simple flow control when looping over the elements. I'd say that's overkill, and performance wise I'm not sure that's going to be great comparing to a simple loop with a break.

Nov 23 '24 02:11 tugtugtug

looks interesting and promisin

@Amanieu if you decide against this api after all, could you please give some directions to ppl like me who is basically stuck without the raw table access? I'm against forking unless absolutely necessary, yet I don't see a way forward as the use case is simply removed from the current offerings without much lead time. Even with the cursor API (which could be a long way to go), I'm not sure we will get the same performance. IMO, simplicity is sometimes better than a more general API that seemingly does everything.

Nov 25 '24 15:11 tugtugtug

In the absence of a complete cursor API, I think your best bet would be to use extract_if to achieve what you want.

Nov 25 '24 18:11 Amanieu

In the absence of a complete cursor API, I think your best bet would be to use extract_if to achieve what you want.

but that requires creating at least an atomic variable, and have to insert the last element back, which seems quite hacky... do you see a way to avoid that?

As a workaround, you could extract that final item to terminate the loop, and then re-insert it. Even with rehashing, that may be faster than continuing to iterate the rest of the map.

Yes, that's what I'm trying to write a test for, and I really find it awkward, it ends up like this.
	let done = AtomicBool::new(false);
	let mut count = 0;
	for item in map
		.extract_if(|k, v| -> bool {
			count += 1;
			if count == limit {
				done.store(true, std::sync::atomic::Ordering::Relaxed);
				return true;
			}
			predicate(k, v)
		})
	{
		if done.load(std::sync::atomic::Ordering::Relaxed) {
			map.insert(item.0, item.1);
			break;
		}
	}
Otherwise, yes I do think this API can serve a unique purpose, but Option<bool> is kind of awkward to use, and I'm not sure the use case (in that gap of extract_if) is common enough to warrant the addition. If we do add something -- it would be slightly more powerful returning ControlFlow<bool, bool>, so you have explicit Continue and Break signals, and the ability to keep/erase in either case.

I did debate whether Option is the right approach, just wanted something lite weight to pass the stop signal. But yes, I can see it can be more powerful with an explicit flow control. I can work towards this if that's something you guys would more likely to accept.

Nov 25 '24 18:11 tugtugtug