feat: Add filter to HashSet/Table/Map
With the removal of the raw table, it is hard to implement an efficient loop to conditionally remove/keep certain fields up to a limit. i.e. a loop that can be aborted and does not require rehash the key for removal of the entry.
not sure if the name makes sense... maybe prune is better if I reverse the removal logic?
You can already do this with extract_if, because dropping the iterator will keep the remainder.
let mut removed = 0;
for _ in map.extract_if(|&k, _| k % 2 != 0) {
removed += 1;
if removed == 3 {
break;
}
}
extract_if
Thanks @cuviper , does this still loop over all hash entries tho? If the intention is to loop over all entries, then the exisiting retain works as well, as one's predicate can just stop returning false, once it returned enough trues.
extract_ifThanks @cuviper , does this still loop over all hash entries tho? If the intention is to loop over all entries, then the exisiting retain works as well, as one's predicate can just stop returning false, once it returned enough trues.
Nvm, I see how it works now, it certainly did not catch my attention how it works :), thought it works as retain, maybe I'll add an example to the documentaion of extract_if to capture this use case.
I suppose a limitation compared to your proposal is that you can only stop extract_if on the boundary of an extracted item, after each Iterator::next(). So extract_if can't shortcut something like "on this condition, keep this item and everything after."
I suppose a limitation compared to your proposal is that you can only stop
extract_ifon the boundary of an extracted item, after eachIterator::next(). Soextract_ifcan't shortcut something like "on this condition, keep this item and everything after."
Yeah, I was trying to write a test with my use case, and found it awkward if I would like to break out early before the next matched entry.
@cuviper do you think then the addition of the API still serves a purpose, the main use case for the new api is to limit the # of iterations per access to the hashmap, and extract_if certainly does not serve that purpose well as discussed.
As a workaround, you could extract that final item to terminate the loop, and then re-insert it. Even with rehashing, that may be faster than continuing to iterate the rest of the map.
Otherwise, yes I do think this API can serve a unique purpose, but Option<bool> is kind of awkward to use, and I'm not sure the use case (in that gap of extract_if) is common enough to warrant the addition.
If we do add something -- it would be slightly more powerful returning ControlFlow<bool, bool>, so you have explicit Continue and Break signals, and the ability to keep/erase in either case.
As a workaround, you could extract that final item to terminate the loop, and then re-insert it. Even with rehashing, that may be faster than continuing to iterate the rest of the map.
Yes, that's what I'm trying to write a test for, and I really find it awkward, it ends up like this.
let done = AtomicBool::new(false);
let mut count = 0;
for item in map
.extract_if(|k, v| -> bool {
count += 1;
if count == limit {
done.store(true, std::sync::atomic::Ordering::Relaxed);
return true;
}
predicate(k, v)
})
{
if done.load(std::sync::atomic::Ordering::Relaxed) {
map.insert(item.0, item.1);
break;
}
}
Otherwise, yes I do think this API can serve a unique purpose, but
Option<bool>is kind of awkward to use, and I'm not sure the use case (in that gap ofextract_if) is common enough to warrant the addition.If we do add something -- it would be slightly more powerful returning
ControlFlow<bool, bool>, so you have explicitContinueandBreaksignals, and the ability to keep/erase in either case.
I did debate whether Option is the right approach, just wanted something lite weight to pass the stop signal. But yes, I can see it can be more powerful with an explicit flow control. I can work towards this if that's something you guys would more likely to accept.
Now renamed the function to filter (by lack of a better name) with ControlFlow<bool, bool> as the return value, also beefed up documentation to explain the behaviors of the return combos.
@cuviper any feedback with the current PR? As mentioned, I'll work towards any suggestion that would improve this use case.
I don't love the name filter, but I don't have a good suggestion right now. Otherwise, the shape of this seems pretty good. Ultimately @Amanieu should decide on new hashbrown APIs though.
I don't love the name
filter, but I don't have a good suggestion right now. Otherwise, the shape of this seems pretty good. Ultimately @Amanieu should decide on newhashbrownAPIs though.
Yeah, same, I'd be open to any suggestions with the naming. I couldn't come up with anything better, my candidates are, prune, filter, retain_with_control.
Can't you implement the same functionality by just using a skip_rest flag in your closure? Just set that flag to true when you would return Break and then cause the closure to return true for all remaining elements? The compiler should optimize that into a break (do check this though).
Thank you @Amanieu! I just tried it again, ~~and yes, if I just have a simple variable that controls the flow, then it does indeed get optimized by the compiler~~. However, if I have any sort of indirection, say if the condition is managed by another lambda or another function, then the compiler would not be able to optimize it out. Imagine the maintainance of the hashmap is based on the stress level of the resources. So the loop is not really based on a constant but rather some calculation. I've made a simple example showing that the compiler optimization is not reliable. The following example can be compiled with "inline-more" and release mode (with $ rustc --version rustc 1.79.0 (129f3b996 2024-06-10), but would still print "3" multiple times.
use hashbrown::HashMap;
fn main() {
let mut map: HashMap<i32, i32> = (0..8).map(|x|(x, x*10)).collect();
let mut counter = 0;
let stop = |counter| {
if counter == 3 {
println!("{}", counter);
return true;
}
return false;
};
map.retain(|&k, _| {
if stop(counter) {
return true;
}
if k % 2 == 0 {
counter += 1;
return true;
}
return false;
});
assert!(map.len() >= 3);
}
That print is an observable side effect -- the compiler cannot skip that by optimization.
That print is an observable side effect -- the compiler cannot skip that by optimization.
@cuviper After rerun I found even with the simplest test, it is still failing (so my previous comment wasn't correct):
use hashbrown::HashMap;
fn main() {
let mut map: HashMap<i32, i32> = (0..8).map(|x|(x, x*10)).collect();
let mut counter = 0;
let mut counter_true = 0;
map.retain(|&k, _| {
if counter == 3 {
counter_true += 1;
return true;
}
if k % 2 == 0 {
counter += 1;
return true;
}
return false;
});
assert_eq!(counter_true, 1);
}
That print is an observable side effect -- the compiler cannot skip that by optimization.
@cuviper After rerun I found even with the simplest test, it is still failing (so my previous comment wasn't correct):
use hashbrown::HashMap; fn main() { let mut map: HashMap<i32, i32> = (0..8).map(|x|(x, x*10)).collect(); let mut counter = 0; let mut counter_true = 0; map.retain(|&k, _| { if counter == 3 { counter_true += 1; return true; } if k % 2 == 0 { counter += 1; return true; } return false; }); assert_eq!(counter_true, 1); }
I guess my point is, if one has to craft their code so carefully to make sure the loop ends when it should, it signals the need for this more explicit API, no? this also fails, now the exit condition has nothing in it:
use hashbrown::HashMap;
fn main() {
let mut map: HashMap<i32, i32> = (0..8).map(|x|(x, x*10)).collect();
let mut counter = 0;
map.retain(|&k, _| {
counter += 1;
if counter >= 3 {
return true;
}
if k % 2 == 0 {
return true;
}
return false;
});
assert_eq!(counter, 3);
}
The values of counter and counter_true can't be changed by optimization either. The way to think about this is to see what the behavior would be without any optimization, and then it should behave the exact same way when optimized -- only faster. So if you're looking at those values after the loop ends, the compiler has to give you the same answer either way.
(That doesn't include explicit debug/release differences like debug-assertions and overflow-checks, which change the code that is produced even before optimization starts.)
If you want the optimizer to see that it can skip the rest of the loop, you should "break" in a way that doesn't modify anything that would be visible outside that loop.
I guess my point is, if one has to craft their code so carefully to make sure the loop ends when it should, it signals the need for this more explicit API, no?
Maybe, if that use case is deemed significant enough to warrant a larger API. I'm sure there are a lot of things we can imagine that would be easier to accomplish with some specialized API, but it doesn't mean we should add them all.
@cuviper thank you for your thoughtful comments. I understand your perspective, but I respectfully disagree that users of this library need to have in-depth knowledge of compiler internals to optimize their code. In my experience as shown above, achieving optimal performance can be quite challenging.
here are a lot of things we can imagine that would be easier to accomplish with some specialized API, but it doesn't mean we should add them all.
I feel this was misunderstood. My intention was not to propose an API that's completely new or unreasonable, but rather to share a specific use case that might not have been considered seriously previously, and it was achievable through the raw table apis that are now taken away. If removing the raw table API is in favor of the use of the hashbrown APIs (which is a good direction IMO), then I felt there is a gap that should be filled by this extra API.
I think I tried to provide realistic and relevant examples, and I'm open to reasonable workarounds that address the concerns.
I'm also not trying to dismiss your case, just offering a reason why we might not address it. Sometimes the answer can be "no" for other reasons too, like maintenance burden. Either way, this PR is not decided yet!
Just to finish the topic about the compiler optimization: This is the latest I tried, compiled with release mode:
use hashbrown::HashMap;
fn main() {
let mut map: HashMap<i32, i32> = (0..8).map(|x|(x, x*10)).collect();
let mut counter = 0;
map.retain(|&k, _| {
counter += 1;
if counter >= 3 {
return true;
}
if k % 2 == 0 {
return true;
}
return false;
});
}
This is the assembly from the counter +=1;:
inc %edx
cmp $0x2,%edx
jg 0x55555555cd68 <_ZN14test_hashbrown4main17h595336c6b17be939E+584>
tzcnt %r9d,%r10d
shl $0x3,%r10d
mov %r8,%r11
sub %r10,%r11
mov -0x8(%r11),%r10d
and $0x1,%r10d
je 0x55555555cd68 <_ZN14test_hashbrown4main17h595336c6b17be939E+584>
mov %rdi,%r10
sub %r11,%r10
sar $0x3,%r10
lea -0x10(%r10),%r11
and %rsi,%r11
movdqu (%rdi,%r11,1),%xmm1
pcmpeqb %xmm0,%xmm1
psllw $0x7,%xmm1
pmovmskb %xmm1,%ebx
test %ebx,%ebx
jne 0x55555555cd30 <_ZN14test_hashbrown4main17h595336c6b17be939E+528>
I'm not great at assembly code, but afaik, this does not early exit the loop afaik. Use GDB to log the count of the return true;, I also saw it hits more than 3 times.
@cuviper do you have more things to try? I could give it a couple of more iterations just to close off this topic.
I'm also not trying to dismiss your case, just offering a reason why we might not address it. Sometimes the answer can be "no" for other reasons too, like maintenance burden. Either way, this PR is not decided yet!
I did feel a bit discouraged as you could see. But again, I would be convinced if any reasonable workarounds are available to normal users like myself, or any suggestions to make this API more acceptable to be included. Having lost the access to the raw table already incurred extra maintenance to the users of the library afaik, but I support that as it would reduce the maintainence cost to this great library. And I understand the maintainers of the library has the final saying for whatever reasons. I'll accept either way, as I have tried my best.
There's also the fun fact that counter += 1 can in theory wrap around, so in the optimizer's eyes that may not always remain >=3. I tried counter = counter.saturating_add(1) though, and that didn't help.
There's also the fun fact that
counter += 1can in theory wrap around, so in the optimizer's eyes that may not always remain>=3. I triedcounter = counter.saturating_add(1)though, and that didn't help.
True, I mean, we could keep trying, but I feel like this is probably beyond any reasonable effort from a normal user :)
Overall, I think this ties into the discussion we had a few days ago in the libs-api meeting about extract_if, specifically around desires for a more general version of this operation as shown in https://github.com/rust-lang/rust/issues/43244#issuecomment-2495135299. I think the best course of action here would be to design some form of cursor API that would work for both Vec and HashMap.
Overall, I think this ties into the discussion we had a few days ago in the libs-api meeting about
extract_if, specifically around desires for a more general version of this operation as shown in rust-lang/rust#43244 (comment). I think the best course of action here would be to design some form of cursor API that would work for bothVecandHashMap.
The cursor API looks interesting and promising for the general use cases, however implementation seems rather complex for what this api tries to fix, that is a simple flow control when looping over the elements. I'd say that's overkill, and performance wise I'm not sure that's going to be great comparing to a simple loop with a break.
looks interesting and promisin
@Amanieu if you decide against this api after all, could you please give some directions to ppl like me who is basically stuck without the raw table access? I'm against forking unless absolutely necessary, yet I don't see a way forward as the use case is simply removed from the current offerings without much lead time. Even with the cursor API (which could be a long way to go), I'm not sure we will get the same performance. IMO, simplicity is sometimes better than a more general API that seemingly does everything.
In the absence of a complete cursor API, I think your best bet would be to use extract_if to achieve what you want.
In the absence of a complete cursor API, I think your best bet would be to use
extract_ifto achieve what you want.
but that requires creating at least an atomic variable, and have to insert the last element back, which seems quite hacky... do you see a way to avoid that?
As a workaround, you could extract that final item to terminate the loop, and then re-insert it. Even with rehashing, that may be faster than continuing to iterate the rest of the map.
Yes, that's what I'm trying to write a test for, and I really find it awkward, it ends up like this.
let done = AtomicBool::new(false); let mut count = 0; for item in map .extract_if(|k, v| -> bool { count += 1; if count == limit { done.store(true, std::sync::atomic::Ordering::Relaxed); return true; } predicate(k, v) }) { if done.load(std::sync::atomic::Ordering::Relaxed) { map.insert(item.0, item.1); break; } }Otherwise, yes I do think this API can serve a unique purpose, but
Option<bool>is kind of awkward to use, and I'm not sure the use case (in that gap ofextract_if) is common enough to warrant the addition. If we do add something -- it would be slightly more powerful returningControlFlow<bool, bool>, so you have explicitContinueandBreaksignals, and the ability to keep/erase in either case.I did debate whether Option is the right approach, just wanted something lite weight to pass the stop signal. But yes, I can see it can be more powerful with an explicit flow control. I can work towards this if that's something you guys would more likely to accept.