rust-libp2p
rust-libp2p copied to clipboard
fix(swarm): rewrite `NetworkBehaviour` macro for more optimal code gen
Description
I have rewritten NetworkBehavior
derive macro to generate more optimal and faster to compile code when using more behaviours (5, 10, 20), I noticed performance degrades even though I benchmarked the same load. This is related to https://github.com/libp2p/rust-libp2p/pull/5026.
New macro implementation generates enums and structs for each type implementing the traits instead of type-level linked lists. In many cases, this makes resulting types more compact (we store just one enum tag, whereas composed Either
s each need to store tags to make values addressable) and makes the enum dispatch constant. This also opened the opportunity to optimize UpgradeInfoIterator
and ConnectionHandler
into a state machine (they now remember where they stopped polling/iterating and skipped exhausted subhandlers/iterators). We could optimize the NetworkBehaviour
itself too, but it would require users to put extra fields into the struct (this could be optional for BC).
Change checklist
- [x] I have performed a self-review of my code
- [x] I have made corresponding changes to the documentation (none)
- [x] I have added tests that prove my fix is effective or that my feature works (no regressions)
- [ ] A changelog entry has been made in the appropriate crates
Numbers for new macro
behaviours | iters | protocols per behaviour | - | per iter | - |
---|---|---|---|---|---|
1 | 10000 | 10 | 5.0441 ns | 5.0892 ns | 5.1472 ns |
1 | 10000 | 100 | 7.8285 ns | 8.0366 ns | 8.3113 ns |
1 | 10000 | 1000 | 19.439 ms | 19.568 ms | 19.731 ms |
5 | 10000 | 2 | 6.1172 ns | 6.2539 ns | 6.4356 ns |
5 | 10000 | 20 | 14.192 ns | 14.835 ns | 15.676 ns |
5 | 10000 | 200 | 270.12 ms | 272.66 ms | 275.51 ms |
10 | 10000 | 1 | 8.0800 ns | 8.2733 ns | 8.4751 ns |
10 | 10000 | 10 | 18.930 ns | 19.989 ns | 21.327 ns |
10 | 10000 | 100 | 301.00 ms | 307.12 ms | 313.86 ms |
20 | 5000 | 1 | 12.019 ns | 12.222 ns | 12.488 ns |
20 | 5000 | 10 | 40.238 ns | 41.747 ns | 43.697 ns |
20 | 5000 | 100 | 731.50 ms | 748.87 ms | 767.03 ms |
Compared to #5026.
behaviours | iters | protocols per behaviour | - | per iter | - | - | per iter | - | verdict |
---|---|---|---|---|---|---|---|---|---|
1 | 10000 | 10 | 6.9099 ns | 7.0400 ns | 7.1924 ns | +19.863% | +51.795% | +95.414% | Performance has regressed. |
1 | 10000 | 100 | 11.301 ns | 11.701 ns | 12.220 ns | +3.6304% | +61.992% | +156.01% | No change in performance detected. |
1 | 10000 | 1000 | 45.102 ms | 45.699 ms | 46.335 ms | +129.98% | +133.54% | +137.49% | Performance has regressed. |
5 | 10000 | 2 | 6.7147 ns | 6.8633 ns | 7.0595 ns | -13.115% | +21.191% | +65.712% | No change in performance detected. |
5 | 10000 | 20 | 33.995 ns | 36.069 ns | 38.805 ns | +56.477% | +186.56% | +410.85% | Performance has regressed. |
5 | 10000 | 200 | 844.90 ms | 863.90 ms | 883.72 ms | +208.68% | +216.83% | +224.95% | Performance has regressed. |
10 | 10000 | 1 | 9.6857 ns | 9.8934 ns | 10.168 ns | -5.8785% | +33.452% | +94.927% | No change in performance detected. |
10 | 10000 | 10 | 353.64 ns | 380.66 ns | 415.67 ns | +1326.5% | +2344.9% | +4154.7% | Performance has regressed. |
10 | 10000 | 100 | 1.2271 s | 1.2532 s | 1.2804 s | +296.43% | +308.03% | +320.38% | Performance has regressed. |
20 | 5000 | 1 | 15.601 ns | 16.091 ns | 16.726 ns | +11.958% | +65.527% | +145.85% | Performance has regressed. |
20 | 5000 | 10 | 128.33 µs | 138.88 µs | 152.28 µs | +237805% | +421868% | +752554% | Performance has regressed. |
20 | 5000 | 100 | 1.9057 s | 1.9361 s | 1.9668 s | +151.25% | +158.54% | +166.02% | Performance has regressed. |
Feature flag sounds very reasonable (since I want to use this in my project), In that case, I can introduce a required field in the behavior that can store the state needed to also improve the main poll implementation. The main idea behind the custom poll is simple:
// previous implementation
#(
if let std::task::Poll::Ready(event) = self.#fields.poll(cx) {
return std::task::Poll::Ready(event
.map_custom(#to_beh::#var_names)
.map_outbound_open_info(#ooi::#var_names)
.map_protocol(#ou::#var_names));
}
)*
// proposed implementation
let mut fuel = #beh_count;
while fuel > 0 {
// save the poll position to avoid repolling exhaused handlers
match self.field_index {
#(#indices => match self.#fields.poll(cx) {
std::task::Poll::Ready(event) =>
return std::task::Poll::Ready(event
.map_custom(#to_beh::#var_names)
.map_outbound_open_info(#ooi::#var_names)
.map_protocol(#ou::#var_names)),
std::task::Poll::Pending => {}
},)*
_ => {
self.field_index = 0;
continue;
}
}
self.field_index += 1;
fuel -= 1;
}
in each poll:
- we return pending if all handlers return pending
- if behavior returns ready, the next poll will poll it first
- if we start polling from later behavior, we wrap around if needed
This polling pattern ensures we don't repoll other behaviors while exhausting events from some point of the hierarchy. Hopefully, the compiler is smart enough to use a branch table for the match.
For this to work, I need to maintain an integer between poll calls, thus the extra field.
This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏
Find myself doing this too lol. My implementation mimics existing *Select
structs(public or hidden). Somehow I managed to make it work, which is funny. Past discussion: #3902. TL;DR: type-based composition is preferred.
This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏
This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏
This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏
This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏
@jakubDoka we have a rust-libp2p maintainers meeting where we all discuss technical changes/PRs that I think you should attend. The next one is starting right now: https://lu.ma/2024-07-02-rust-libp2p
But the next one is in two weeks: https://lu.ma/2024-07-16-rust-libp2p
@dhuseby sorry, I missed the last one, I'll hopefully show up on the 16th
Well that was an accident