stacks-core
stacks-core copied to clipboard
Chore: Replace Contains Key with Entry
Benchmark Results
hyperfine -w 3 -r 10 "./stacks-inspect-383d586a7-4-mar replay-block ~/chainstate range 99990 100000" "./stacks-inspect-with-entry-4-mar replay-block ~/chainstate range 99990 100000" "./stacks-inspect-immutable-entries replay-block ~/chainstate range 99990 100000"
Benchmark 1: ./stacks-inspect-383d586a7-4-mar replay-block ~/chainstate range 99990 100000
Time (mean ± σ): 7.977 s ± 0.060 s [User: 7.584 s, System: 0.365 s]
Range (min … max): 7.934 s … 8.105 s 10 runs
Benchmark 2: ./stacks-inspect-with-entry-4-mar replay-block ~/chainstate range 99990 100000
Time (mean ± σ): 7.918 s ± 0.024 s [User: 7.524 s, System: 0.364 s]
Range (min … max): 7.891 s … 7.965 s 10 runs
Benchmark 3: ./stacks-inspect-immutable-entries replay-block ~/chainstate range 99990 100000
Time (mean ± σ): 7.946 s ± 0.012 s [User: 7.554 s, System: 0.362 s]
Range (min … max): 7.925 s … 7.966 s 10 runs
Summary
./stacks-inspect-with-entry-4-mar replay-block ~/chainstate range 99990 100000 ran
1.00 ± 0.00 times faster than ./stacks-inspect-immutable-entries replay-block ~/chainstate range 99990 100000
1.01 ± 0.01 times faster than ./stacks-inspect-383d586a7-4-mar replay-block ~/chainstate range 99990 100000
The benchmark above showcase the performance of data for the following:
- default run
- only update mutable occurrences
- update mutable and immutable as well
The best benchmark is the 2nd case, with updated mutable occurrences and default immutable ones.
Applicable issues
- https://github.com/stacks-network/stacks-core/issues/4316
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 83.41%. Comparing base (
25311e4
) to head (283f46b
). Report is 14 commits behind head on next.
Additional details and impacted files
@@ Coverage Diff @@
## next #4483 +/- ##
==========================================
- Coverage 85.25% 83.41% -1.85%
==========================================
Files 471 471
Lines 337762 337762
Branches 317 317
==========================================
- Hits 287955 281728 -6227
- Misses 49799 56026 +6227
Partials 8 8
Files | Coverage Δ | |
---|---|---|
clarity/src/vm/ast/traits_resolver/mod.rs | 98.48% <100.00%> (ø) |
|
clarity/src/vm/costs/mod.rs | 81.78% <100.00%> (+0.01%) |
:arrow_up: |
clarity/src/vm/types/mod.rs | 91.70% <100.00%> (-0.01%) |
:arrow_down: |
... and 65 files with indirect coverage changes
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 25311e4...283f46b. Read the comment docs.
I've run the benchmark again after the latest commit (a8597f1), compared to the previous one (885b6e5). The results show a decrease of 2%
in block processing time, above the initial 1%
:
Benchmark 1: ./stacks-inspect-entry-no-inserts replay-block ~/chainstate range 99990 100000
Time (mean ± σ): 6.694 s ± 0.011 s [User: 6.310 s, System: 0.350 s]
Range (min … max): 6.680 s … 6.711 s 10 runs
Benchmark 2: ./stacks-inspect-entry-inserts replay-block ~/chainstate range 99990 100000
Time (mean ± σ): 6.587 s ± 0.009 s [User: 6.227 s, System: 0.325 s]
Range (min … max): 6.573 s … 6.606 s 10 runs
Summary
./stacks-inspect-entry-inserts replay-block ~/chainstate range 99990 100000 ran
1.02 ± 0.00 times faster than ./stacks-inspect-entry-no-inserts replay-block ~/chainstate range 99990 100000
In this case, Benchmark 1
is 885b6e5
, and Benchmark 2
is a8597f1
.
It takes several (sometimes 10x) lines of code to express that which can be done in a single line with contains_key()
This PR is currently at +275, −247
. So there is an increase in line count, but it's pretty small
Moreover, to use this API, you needed to manually transform lots of code on consensus-critical code paths, which makes merging this PR super risky
It's a simple and straightforward transformation, similar to transforming if
statements into match
statements. It also let's us remove some ugly .expect()
calls like this:
let resp = self
.requests
.remove(&req)
.expect("BUG: had key but then didn't")
.expect("BUG: had response but then didn't");
IMO this makes the entry API safer to use than indexing a HashMap
multiple times with other methods
The unit and integration tests are currently passing, but we can also do some extra testing here
Finally, the speed improvements are basically nonexistent (a questionable 1% speedup?), which means the wins from the PR are basically nonexistent
The numbers in the description are incorrect, it's about a 2% speedup in block processing. See the second comment on this issue
It's a simple and straightforward transformation, similar to transforming if statements into match statements. It also let's us remove some ugly .expect() calls like this:
I'm really not seeing how this:
match foo.get(&bar).entry() {
Entry::Occupied(value) => {
/* do something */
}
Entry::Vacant(e) => {
/* do something else */
}
}
is a legibility improvement over this:
if foo.contains_key(&bar) {
/* do something */
} else {
/* do something else */
}
The former has two levels of indentation, whereas the latter has one. Also, the former has 6 lines of boilerplate, whereas the latter has 3. We've been trying to reduce gratuitous indentations for the past few years because it aids in comprehensibility, but this PR works against it.
Do we know for sure that using the Entires
API is required to gain that 2% speed improvement? Because, if the transformation is really that straightforward, then the compiler really should be doing that internally (or have an option to do so).
I'm really not seeing how this... is a legibility improvement over this
I don't know about that. Sure the second example is less text, but the first is more explicit because both cases are clearly labeled, rather than using an else
for one case. This example also misses that when using Entry
, the code inside these blocks can be much more readable because there's no need to access the HashMap
again. Consider this example from the PR:
if !self.prune_inbound_counts.contains_key(prune) {
self.prune_inbound_counts.insert(prune.clone(), 1);
} else {
let c = self.prune_inbound_counts.get(prune).unwrap().to_owned();
self.prune_inbound_counts.insert(prune.clone(), c + 1);
}
vs.
match self.prune_inbound_counts.entry(prune.to_owned()) {
Entry::Occupied(mut e) => {
*e.get_mut() += 1;
}
Entry::Vacant(e) => {
e.insert(1);
}
}
Line count went up, but the second is fewer characters and much more readable. In the first, you have to read it and think about what it's doing, whereas it's obvious as soon as you look at the second that you're incrementing a counter. Also, there's no unwrap()
needed when using the Entry
Do we know for sure that using the Entires API is required to gain that 2% speed improvement?
The hyperfine stacks-inspect replay-block
benchmark we use is very consistent between runs, so I'm confident this represents a performance gain. This is not surprising, it's halving the number of hashes needed in some places.
The perf gains are probably entirely from the few changes in clarity. We should at least merge those. I'm not sure the other changes in stackslib are even used in the replay-block
benchmark
Because, if the transformation is really that straightforward, then the compiler really should be doing that internally (or have an option to do so)
You'd think, but during my testing I've been surprised by what the compiler doesn't optimize away at -O3
The perf gains are probably entirely from the few changes in clarity. We should at least merge those. I'm not sure the other changes in stackslib are even used in the replay-block benchmark
Yup, 100% onboard with this. Now that you mention it, it was surprising that these changes impacted code outside of stacks_common
and clarity
.
Line count went up, but the second is fewer characters and much more readable. In the first, you have to read it and think about what it's doing, whereas it's obvious as soon as you look at the second that you're incrementing a counter. Also, there's no unwrap() needed when using the Entry
The example you're citing here could also be refactored as follows, with less boilerplate and the same number of clones:
if let Some(c) = self.prune_inbound_counts.get_mut(prune) {
*c += 1;
} else {
self.prune_inbound_counts.insert(prune.clone(), 1);
}
Granted, there's a lot of really old code in the stackslib/src/net
directory that was written before I had a good handle on Rust, and it could really use a spit-shine.
Also, I think that using if let
with .entry()
is also the cleanest way to insert a key iff it doesn't exist already. Another example from the PR:
if !lowest_reward_cycle_with_missing_block.contains_key(nk) {
lowest_reward_cycle_with_missing_block.insert(nk.clone(), reward_cycle);
}
vs.
if let Entry::Vacant(e) = lowest_reward_cycle_with_missing_block.entry(nk) {
e.insert(reward_cycle);
}
Aside from the Clarity perf gains, there are some clear readability improvements and .unwrap()
or .expect()
eliminations in this PR that I think are worth keeping as well
I disagree -- I think it requires more cognitive overhead. Instead of reading "okay, insert this thing into the hash table if it doesn't have it", I'm reading "okay, get this opaque pointer-that-is-not-a-pointer thing ("entry") via a let-binding, and then call this deref-and-store-that-is-not-a-deref-and-store function on it ("insert") to put this thing into the hash table". This is why I avoid the Entry
API in my code -- I think it creates a needless layer of indirection that can be avoided with the containers' APIs.
That all said, I'm not really keen to pursue the bike-shedding further since I don't think we're going to ever come to agreement. Also, it's off-topic.
The purpose of the workstream to which this PR belongs is to improve the speed of the Clarity VM. A few comments ago, you mentioned that the speedup you see does not appear to come from the introduction of the Entries
API (or if it does, there's no need to be touching files outside of clarity
and perhaps stacks_common
). When viewed only through this lens, without regards to the Entries
API, this PR is not achieving this -- it's making changes that do not improve the Clarity VM, and it's making changes in things outside of the Clarity VM. As such, I'm leaving my review as-is until these two problems are addressed.
@jcnelson I've removed all the modifications outside of clarity
package. Can you please review it?
cc: @jbencin
@jcnelson @jbencin I've removed the .collect()
as it was failing the build, now it should work. Can you please re-approve this?
I assume this was failing to compile because of a merge conflict?
There were no previous merge conflicts, what I've seen was that the .collect()
was previously implemented at the time when i created this PR, but on the latest next
it was removed - that's why the compile was failing. I manually solved it today before updating the branch, so I didn't see any conflicts when re-syncing.
It looks like it's because I changed probe_for_generics()
to accept impl Iterator
in #4500
I've requested (via Slack) that we try to retain the readability of
contains_key
via a new trait or method
Do you have a suggestion for how to do that?
I've requested (via Slack) that we try to retain the readability of
contains_key
via a new trait or methodDo you have a suggestion for how to do that?
Yeah I gave a few to @ASuciuX on Slack -- but if it proves difficult I'm happy to take it.
If there is a more appropriate place for the traits and implements please let me know and I will move them.
If there is a more appropriate place for the traits and implements please let me know and I will move them.
If @cylewitruk can merge his StacksHashMap
changes, that would be the best place to put it. Or if not, this should go in it's own file, maybe in stacks-common/src/util or stacks-common/src/types
If @cylewitruk can merge his
StacksHashMap
changes, that would be the best place to put it. Or if not, this should go in it's own file, maybe in stacks-common/src/util or stacks-common/src/types
I might actually suggest starting a new dir stacks_common/src/ext where all "extension" traits can be placed. This pattern can be applied to a lot of repetitive and/or verbose code..
If StacksHashMap
can merge then the logic can be placed directly in an impl
- this is more of a way to extend the API for non-local types.