nearcore
nearcore copied to clipboard
Runtime hits unknown perfomance cliff in the estimator
See
https://github.com/matklad/nearcore/commit/80b44a877a1d2de76b4adcf63567984d615e7c36
In that commit, I add a benchmark which sends 150 blocks, where each block contains 100 deploy code transaction.
EDIT: https://github.com/near/nearcore/issues/4771#issuecomment-933347496 is modernized version.
The benchmark is implemented in two ways:
- on top of parameter estimator testbed
- on top of runtime tester
In the first case, I observe a very stage behavior, where at some point block processsing times severely degrades.
To reproduce, checkout this commit and run
$ cargo test --release --package runtime-params-estimator --lib --features required – v2::fast --exact --nocapture
$ cargo test --release --package runtime-params-estimator --lib --features required – v2::slow --exact --nocapture
Look for Runtime::apply
lines in the output, they they tell you how
long it takes to process one apply.
In the fast
case, every apply takes about 16ms for me. In the slow
case, apply start at about 30ms, but then abruptly go to 600ms.
Would be great to understand why the time suddenly balloons!
Quick look indicates that the slowdown comes from the storage. I’ve
added some more tracing and here’s output with a fast
process_transaction
invocation followed by a slow one (they are
within the same apply
call):
147.38µs Runtime::process_transaction
145.70µs verify_and_charge_transaction
48.12µs validate_transaction
46.70µs get_account
46.15µs TrieUpdate::get
49.21µs get_access_key
48.65µs TrieUpdate::get
…
2.64ms Runtime::process_transaction
2.64ms verify_and_charge_transaction
49.03µs validate_transaction
1.30ms get_account
1.30ms TrieUpdate::get
1.29ms get_access_key
1.29ms TrieUpdate::get
TrieUpdate::get
is suddenly 25 times slower and it’s called quite
a bit so it adds up.
@mina86 is it because the value is not cached and you have to do a trie traversal?
It doesn’t look like cache issues. I haven’t measured cache
hit ratio or counted how often we hit the storage, but just by looking
at logs the numbers seem similar for fast and slow cases. On the
other hand, Store::get
becomes ten times slower all of a sudden:
74.73µs Trie::Update::get
68.40µs Trie::lookup
7.45µs Store::get
3.50µs Store::get
3.16µs Store::get
2.10µs Store::get
2.82µs Store::get
2.79µs Store::get
3.38µs Store::get
4.09µs Store::get
3.77µs Store::get
3.07µs Store::get
2.45µs Store::get
3.23µs Store::get
2.91µs Store::get
2.31µs Store::get
2.42µs Store::get
2.63µs Store::get
2.59µs Store::get
4.83µs Store::get
vs.
1.31ms Trie::Update::get
1.24ms Trie::lookup
86.48µs Store::get
72.27µs Store::get
70.31µs Store::get
69.17µs Store::get
68.85µs Store::get
68.40µs Store::get
68.75µs Store::get
69.13µs Store::get
69.77µs Store::get
68.91µs Store::get
68.82µs Store::get
69.21µs Store::get
68.84µs Store::get
78.35µs Store::get
75.67µs Store::get
76.41µs Store::get
75.47µs Store::get
69.45µs Store::get
So I’m now looking at that.
The difference is caused by how signer of the transaction is chosen. In the fast test signers are chosen in sequence from all the available accounts. In the slow tests, signers are chosen at random from unused accounts. Making the latter use sequential accounts fixes the issue:
--- a/runtime/runtime-params-estimator/src/v2/support.rs
+++ b/runtime/runtime-params-estimator/src/v2/support.rs
@@ -202,12 +202,12 @@ impl<'a, 'c> TransactionBuilder<'a, 'c> {
self.testbed.accounts[account_index].clone()
}
pub(crate) fn random_unused_account(&mut self) -> AccountId {
- loop {
- let account = self.random_account();
+ for account in &self.testbed.accounts {
if self.testbed.used_accounts.insert(account.clone()) {
- return account;
+ return account.clone();
}
}
+ panic!("Not enough accounts");
}
pub(crate) fn random_account_pair(&mut self) -> (AccountId, AccountId) {
let first = self.random_account();
I didn’t look into how exactly this affects the database to cause the slowdown.
Perhaps I’ve spoken too soon. The above is correct and changing
random_unused_account
implementation does speed up the slow test but
the opposite does not slow down the fast test:
--- a/runtime/runtime-params-estimator/src/v2.rs
+++ b/runtime/runtime-params-estimator/src/v2.rs
@@ -449,10 +449,20 @@ fn fast() {
use_in_memory_store: false,
};
- let mut i = 0;
+ let mut used = std::collections::HashSet::new();
+ let mut rng = rand::thread_rng();
+
for h in 0..n_blocks {
let mut block = BlockConfig::at_height((h + 1) as u64);
for _ in 0..block_size {
+
+ let i = loop {
+ let i = rng.gen_range(0, num_accounts as u32);
+ if used.insert(i) {
+ break i as usize;
+ }
+ };
+
let signer_id = accounts[i].clone();
let receiver_id = signer_id.clone();
let signer =
@@ -464,7 +474,6 @@ fn fast() {
signer,
actions: vec![Action::DeployContract(DeployContractAction { code: code.clone() })],
});
- i += 1;
}
s.blocks.push(block)
}
It looks like it’s something else calling random_unused_account
that
is the culprit.
Confirmed that the order of accounts matters here. Iterating accounts by one up or down finished fast. Iterting accounts by eleven shows the same slowdown
diff
diff --git a/runtime/runtime-params-estimator/src/v2.rs b/runtime/runtime-params-estimator/src/v2.rs
index 6a844fcd9..597da1c01 100644
--- a/runtime/runtime-params-estimator/src/v2.rs
+++ b/runtime/runtime-params-estimator/src/v2.rs
@@ -27,7 +27,7 @@ static ALL_COSTS: &[(Cost, fn(&mut Ctx) -> GasCost)] = &[
(Cost::ActionReceiptCreation, action_receipt_creation),
(Cost::ActionSirReceiptCreation, action_sir_receipt_creation),
(Cost::ActionTransfer, action_transfer),
- (Cost::ActionCreateAccount, action_create_account),
+ // (Cost::ActionCreateAccount, action_create_account),
(Cost::ActionDeleteAccount, action_delete_account),
(Cost::ActionAddFullAccessKey, action_add_full_access_key),
(Cost::ActionAddFunctionAccessKeyBase, action_add_function_access_key_base),
@@ -118,28 +118,28 @@ fn action_transfer(ctx: &mut Ctx) -> GasCost {
total_cost - base_cost
}
-fn action_create_account(ctx: &mut Ctx) -> GasCost {
- let total_cost = {
- let mut testbed = ctx.test_bed();
+// fn action_create_account(ctx: &mut Ctx) -> GasCost {
+// let total_cost = {
+// let mut testbed = ctx.test_bed();
- let mut make_transaction = |mut tb: TransactionBuilder<'_, '_>| -> SignedTransaction {
- let sender = tb.random_account();
- let new_account =
- AccountId::try_from(format!("{}_{}", sender, tb.rng().gen::<u64>())).unwrap();
-
- let actions = vec![
- Action::CreateAccount(CreateAccountAction {}),
- Action::Transfer(TransferAction { deposit: 10u128.pow(26) }),
- ];
- tb.transaction_from_actions(sender, new_account, actions)
- };
- testbed.average_transaction_cost(&mut make_transaction)
- };
+// let mut make_transaction = |mut tb: TransactionBuilder<'_, '_>| -> SignedTransaction {
+// let sender = tb.random_account();
+// let new_account =
+// AccountId::try_from(format!("{}_{}", sender, tb.rng().gen::<u64>())).unwrap();
- let base_cost = action_receipt_creation(ctx);
+// let actions = vec![
+// Action::CreateAccount(CreateAccountAction {}),
+// Action::Transfer(TransferAction { deposit: 10u128.pow(26) }),
+// ];
+// tb.transaction_from_actions(sender, new_account, actions)
+// };
+// testbed.average_transaction_cost(&mut make_transaction)
+// };
- total_cost - base_cost
-}
+// let base_cost = action_receipt_creation(ctx);
+
+// total_cost - base_cost
+// }
fn action_delete_account(ctx: &mut Ctx) -> GasCost {
let total_cost = {
diff --git a/runtime/runtime-params-estimator/src/v2/support.rs b/runtime/runtime-params-estimator/src/v2/support.rs
index 6a635eb34..9b62e6fb2 100644
--- a/runtime/runtime-params-estimator/src/v2/support.rs
+++ b/runtime/runtime-params-estimator/src/v2/support.rs
@@ -7,7 +7,7 @@ use near_crypto::{InMemorySigner, KeyType};
use near_primitives::hash::CryptoHash;
use near_primitives::transaction::{Action, DeployContractAction, SignedTransaction};
use near_primitives::types::{AccountId, Gas};
-use rand::prelude::ThreadRng;
+use rand::prelude::{SliceRandom, ThreadRng};
use rand::Rng;
use crate::cases::ratio_to_gas;
@@ -50,6 +50,7 @@ impl<'c> Ctx<'c> {
accounts: (0..self.config.active_accounts).map(get_account_id).collect(),
nonces: HashMap::new(),
used_accounts: HashSet::new(),
+ idx: 0,
}
}
@@ -80,6 +81,7 @@ impl<'c> Ctx<'c> {
accounts: proto.accounts.clone(),
nonces: proto.nonces.clone(),
used_accounts: HashSet::new(),
+ idx: 0,
}
}
@@ -117,6 +119,7 @@ pub(crate) struct TestBed<'c> {
accounts: Vec<AccountId>,
nonces: HashMap<AccountId, u64>,
used_accounts: HashSet<AccountId>,
+ idx: usize,
}
impl<'c> TestBed<'c> {
@@ -190,24 +193,26 @@ impl<'a, 'c> TransactionBuilder<'a, 'c> {
)
}
- pub(crate) fn rng(&mut self) -> ThreadRng {
- rand::thread_rng()
- }
+ // pub(crate) fn rng(&mut self) -> ThreadRng {
+ // rand::thread_rng()
+ // }
pub(crate) fn account(&mut self, account_index: usize) -> AccountId {
get_account_id(account_index)
}
pub(crate) fn random_account(&mut self) -> AccountId {
- let account_index = self.rng().gen_range(0, self.testbed.accounts.len());
+ self.testbed.idx = (self.testbed.idx + 11) % self.testbed.accounts.len();
+ let account_index = self.testbed.idx;
self.testbed.accounts[account_index].clone()
}
pub(crate) fn random_unused_account(&mut self) -> AccountId {
- loop {
+ for i in 0..1000 {
let account = self.random_account();
if self.testbed.used_accounts.insert(account.clone()) {
return account;
}
}
+ panic!("no unused accounts")
}
pub(crate) fn random_account_pair(&mut self) -> (AccountId, AccountId) {
let first = self.random_account();
Ok, it took me just about a week to realize I can throw flamegrapsh at it :sweat_smile:
stepping by 11 acconts:
stepping by 1:
profilnes recoreded with perf record -F 9999 --call-graph dwarf target/release/deps/runtime_params_estimator-0328c9c022713977 v2::slow --exact --nocapture
and rendered with https://github.com/KDAB/hotspot
The rocksdb::VerifyBlockChecksum
looks very suspicious!
I'll stop at this for the time being.
(meta -- does anyone know of a nice way to share per profileing results? The above is me scrennshotting hotspot ui, which doesn's seem ideal)
It’s still not clear to me why changing the fast code to use random accounts doesn’t slow it down though. There are some differences how the accounts are set up in both tests so that could be the explanation. In fast test test###
accounts are added to genesis and transactions are executed using those. In slow test near_###_###
accounts are added by the GenesisBuilder
. Though it’s not clear to me why it would make a difference.
In the fast test signers are chosen in sequence from all the available accounts. In the slow tests, signers are chosen at random from unused accounts
@mina86 what are the available accounts? I wonder whether this affects the trie traversal in the sense that if we go over the accounts sequentially, some trie node will be cached between the reads (if not by nearcore then on the OS level the page may be cached) while accessing them randomly makes it harder to cache anything.
The tests create 200k accounts and all transactions are signed by one of them. The test uses each account only once. Sequential reads indeed seem like a plausible explanation though at this moment I think this is affecting caching in RocksDB or on OS level rather than in our code. (I still need to just add counters or profile each of the Trie::* and Store::* methods to be sure).
Turns out
diff --git a/genesis-tools/genesis-populate/src/lib.rs b/genesis-tools/genesis-populate/src/lib.rs
index d3a9793d..3eceddfa 100644
--- a/genesis-tools/genesis-populate/src/lib.rs
+++ b/genesis-tools/genesis-populate/src/lib.rs
@@ -29,7 +29,7 @@ use near_store::{
use nearcore::{get_store_path, NightshadeRuntime};
fn get_account_id(account_index: u64) -> AccountId {
- AccountId::try_from(format!("near_{}_{}", account_index, account_index)).unwrap()
+ AccountId::try_from(format!("test{}", account_index)).unwrap()
}
pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
diff --git a/runtime/runtime-params-estimator/src/testbed_runners.rs b/runtime/runtime-params-estimator/src/testbed_runners.rs
index 2ede73d9..ab316ae6 100644
--- a/runtime/runtime-params-estimator/src/testbed_runners.rs
+++ b/runtime/runtime-params-estimator/src/testbed_runners.rs
@@ -18,7 +18,7 @@ use std::time::Instant;
/// Get account id from its index.
pub fn get_account_id(account_index: usize) -> AccountId {
- AccountId::try_from(format!("near_{}_{}", account_index, account_index)).unwrap()
+ AccountId::try_from(format!("test{}", account_index)).unwrap()
}
/// Total number of transactions that we need to prepare.
makes the test go fast. The inverse (changing all instances of test{}
to near_{}_{}
in the code base) plus adding random order of accounts makes the fast test slow.
Excellent find! I confirm that the fast
test starts going slowly with near_{}_{}
accounts, so this removes any param-estimator related weirdness from the equation. I also had to replace test{}
with near_{}_{}
across the codebase, and not just in a test, which felt odd. Filed https://github.com/near/nearcore/issues/4835
It feels that this is indeed related to the shape of the trie. @mina86 is it true that changing the names from test{}
to near_{}_{}
actually changes the number of database reads we do?
There’s only a small difference:
test{} near_{}_{}
Trie::retrieve_raw_bytes : 1832596 → 1944114
TrieCachingStorage : 2118923 → 2232747
TrieCachingStorage (cached): 2 → 0
TrieCachingStorage (missed): 2118921 → 2232747
RocksDB::get : 2322764 → 2436590
That is weird. Do we see any significant difference in the amount of time spent on RocksDB::get
?
Modernized and simplified the repro based on mina86 refactors:
https://github.com/matklad/nearcore/commit/558b8e21142d14bcf9fd6fca16c03cabed6234b6
To run:
cargo test --release --package runtime-tester --lib \
--features nearcore/no_cache \
-- perf_cliff --exact --nocapture
I've also found a better workflow for perf profiles
1. Build the test binary
cargo test --no-run --release --package runtime-tester --features nearcore/no_cache --lib -- perf_cliff --exact --nocapture
2. Run the binary directly with perf (you can see the name of the file by ommitting --no-run). -F is sampling frequency: higher number, means bigger size of the resulting profile file
perf record -F 999 --call-graph dwarf,65528 -- target/release/deps/runtime_tester-160c2544f9ae8411 perf_cliff --exact --nocapture
3. The output of the previous command is perf.data file. You need to process it with perf script (I have no idea what this actually does)
perf script -F +pid > slow.perf
4. Go to https://profiler.firefox.com and drag and drop the slow.perf file
5. Click "Upload local Profile" in the top-right corner
6. Share the URL with colleagues
Here's what I get for slow (near_{}_{}
) and fast (test_{}
) versions:
- slow https://profiler.firefox.com/public/azmbf8q65nbrhdqj98752adjnkt18kj9c9pjpvr/flame-graph/?globalTrackOrder=0&localTrackOrderByPid=212964-0wxl&thread=1&v=6
- fast https://profiler.firefox.com/public/vv11t42jzk8nw9fagjd8s62j6g0q9h5s4d7jk6g/flame-graph/?globalTrackOrder=0&localTrackOrderByPid=213518-0wxo&thread=1&v=6
Note that profile is per thread, and we have a lot of threads. Select perf_cliff
thread at the top and Flame Graph
view at the bottom to get the most amusing result.
@matklad so it appears to be something inside RocksDB? cc @mina86
It appears yeah. What I don't understand is why Rocksdb stacks are hanging in the air. It looks as if perf just cant piece shared Rust/C++ stack together
Note how Scenario::run
in both cases take approximately the same 5k amount of samples, but, in the first case, there's just this Rocks stuff out of nowhere.
So I guess next step here is to get an actually useful profile -- we need to understand who calls rocks.
Aha, the stack issue is just becaues by default pref has a limit on stack depth. THe following argument fixes that: --call-graph dwarf,65528
Here's much better profiles
- slow (
near_{}_{}
) https://share.firefox.dev/2WJxwls - fast (
test_{}
) https://share.firefox.dev/3mswaVh
Yeah, looking at those profiles gives so much more to work with. The difference seems to be that in the rocksdb::Version::Get
the slow version does everything that the fast version does, plus rocksdb::TableCache::Get
. I'll stop my investigation here and go back to wasm stuff.
@mina86 please take a look at the new profiling results :)
The difference seems to be that in the
rocksdb::Version::Get
the slow version does everything that the fast version does, plusrocksdb::TableCache::Get
.
Yep, this is exactly true. RocksDB has this piece of code in
Version::Get
:
FilePicker fp(
storage_info_.files_, user_key, ikey, &storage_info_.level_files_brief_,
storage_info_.num_non_empty_levels_, &storage_info_.file_indexer_,
user_comparator(), internal_comparator());
FdWithKeyRange* f = fp.GetNextFile();
while (f != nullptr) {
// ...
}
and in the fast path f
is always null. What FilePicker
is I’m not
sure at this point, but it’s something more than listing files in the
storage directory.
I’ve tried looking at rocksdb.block-cache-usage
and
rocksdb.block-cache-pinned-usage
counters but they remain constant
thorough the test and way below rocksdb.block-cache-capacity
.
Sadly, the summary is that I’m not much closer to figuring out where the performance drop comes from.
This now I think blocks progress on https://github.com/near/nearcore/pull/4733. I am investigating many-orders-of-magnitude differences between the old estimator and the new one, and it seems that I hit "DB is mysteriously slow" case.
The slow case is what is hit by the old estimator, and it looks like this:
1.78ms Runtime::process_receipt
1.78ms Runtime::apply_action_receipt
1.76ms Runtime::apply_action
1.76ms Runtime::action_function_call
1.76ms Runtime::execute_function_call
1.62ms Runtime::get_code
110.87µs retrieve_raw_bytes
104.19µs retrieve_raw_bytes
105.77µs retrieve_raw_bytes
105.73µs retrieve_raw_bytes
106.31µs retrieve_raw_bytes
104.89µs retrieve_raw_bytes
104.48µs retrieve_raw_bytes
104.49µs retrieve_raw_bytes
105.22µs retrieve_raw_bytes
114.13µs retrieve_raw_bytes
104.68µs retrieve_raw_bytes
110.49µs retrieve_raw_bytes
110.93µs retrieve_raw_bytes
110.50µs retrieve_raw_bytes
105.30µs retrieve_raw_bytes
138.43µs Runtime::execute_function_call/1
137.73µs Runtime::execute_function_call/2
137.15µs Runtime::execute_function_call/3
136.50µs run_vm
It takes 1.5ms to fetch contract code from the database, which doesn't make any sense. The fast case looks like this:
210.09µs Runtime::process_receipt
209.44µs Runtime::apply_action_receipt
195.90µs Runtime::apply_action
195.07µs Runtime::action_function_call
192.66µs Runtime::execute_function_call
48.14µs Runtime::get_code
4.04µs retrieve_raw_bytes
1.48µs retrieve_raw_bytes
1.89µs retrieve_raw_bytes
1.87µs retrieve_raw_bytes
2.56µs retrieve_raw_bytes
1.83µs retrieve_raw_bytes
2.08µs retrieve_raw_bytes
2.90µs retrieve_raw_bytes
4.85µs retrieve_raw_bytes
3.55µs retrieve_raw_bytes
2.91µs retrieve_raw_bytes
2.01µs retrieve_raw_bytes
2.28µs retrieve_raw_bytes
2.93µs retrieve_raw_bytes
3.64µs retrieve_raw_bytes
143.66µs Runtime::execute_function_call/1
142.99µs Runtime::execute_function_call/2
142.30µs Runtime::execute_function_call/3
141.54µs run_vm
What if rocksdb issues list is helpful here? This one is quite curious: https://github.com/facebook/rocksdb/issues/3961
I use command cargo test --release --package runtime-tester --lib --features nearcore/no_cache -- perf_cliff --exact --nocapture
.
When I disable set_cache_index_and_filter_blocks
- https://github.com/near/nearcore/compare/558b8e211..bceefb56 - performance become much more stable:
- disabled: 40-80ms to process a single block in most cases, one spike with 120ms
- enabled: 40-60ms, then 1s for each block
@ailisp @matklad do you know why we enable it?
@Longarithm We can try to see if it makes difference to try to adjust cache sizes per column. Like in https://github.com/near/nearcore/pull/5212
have a lot of threads. Select
perf_cliff
thread at the top andFlame Graph
view at the bottom to get the most amusing result.
It appears that int the slow path we spend a lot of time int rocksdb::BlockFetcher::ReadBlockContents
. So RocksDB is spending a lot of time reading blocks in the slow case.
retrieve_raw_bytes
I see that we have a cache in front of the database. This explains the spikes in performance. If you want to debug the issue, please try commenting this code out:
impl TrieStorage for TrieCachingStorage {
fn retrieve_raw_bytes(&self, hash: &CryptoHash) -> Result<Vec<u8>, StorageError> {
let mut guard = self.cache.0.lock().expect(POISONED_LOCK_ERR);
/* if let Some(val) = guard.cache_get(hash) {
Ok(val.clone())
} else {*/
let key = Self::get_key_from_shard_uid_and_hash(self.shard_uid, hash);
let val = self
.store
.get(ColState, key.as_ref())
.map_err(|_| StorageError::StorageInternalError)?;
if let Some(val) = val {
if val.len() < TRIE_LIMIT_CACHED_VALUE_SIZE {
guard.cache_set(*hash, val.clone());
}
Ok(val)
} else {
// not StorageError::TrieNodeMissing because it's only for TrieMemoryPartialStorage
Err(StorageError::StorageInconsistentState("Trie node missing".to_string()))
}
// }
}
fn as_caching_storage(&self) -> Option<&TrieCachingStorage> {
Some(self)
}
}
This makes reads slower, but this shows you the real time it takes to access the database.
462.38ms Process block
461.43ms process_state_update
461.42ms Runtime::apply
66.14ms 100 Runtime::process_receipt
100.10µs 100 get_key
206.69ms 100 Runtime::process_transaction
458.03ms Process block
457.15ms process_state_update
457.14ms Runtime::apply
67.32ms 100 Runtime::process_receipt
94.68µs 100 get_key
204.97ms 100 Runtime::process_transaction
508.51ms Process block
507.50ms process_state_update
507.49ms Runtime::apply
68.10ms 100 Runtime::process_receipt
97.73µs 100 get_key
204.74ms 100 Runtime::process_transaction
469.07ms Process block
468.15ms process_state_update
468.14ms Runtime::apply
68.92ms 100 Runtime::process_receipt
100.43µs 100 get_key
211.39ms 100 Runtime::process_transaction
459.90ms Process block
459.00ms process_state_update
458.99ms Runtime::apply
67.22ms 100 Runtime::process_receipt
95.58µs 100 get_key
205.08ms 100 Runtime::process_transaction
464.87ms Process block
463.96ms process_state_update
463.96ms Runtime::apply
68.66ms 100 Runtime::process_receipt
97.19µs 100 get_key
207.97ms 100 Runtime::process_transaction
For purpose of debugging database performance issue. I recommend disabling this cache, and debugging only the database access times. It's much easier to analyze performance issues by analyzing one thing at a time.
I created a benchmark https://github.com/near/nearcore/pull/5312
It may be useful to try different database configuration.
The database is becoming slower and slower as number of keys grows
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
Running unittests (/home/pmnoxx/repos/nearcore.master/target/release/deps/ibf-b2f9e4ff4ef48925)
running 1 test
test benchmark_db ... took on avg 7.957µs
test benchmark_db ... took on avg 882ns
took on avg 1.546µs
took on avg 1.647µs
took on avg 2.184µs
took on avg 2.716µs
took on avg 2.26µs
took on avg 3.401µs
took on avg 3.845µs
took on avg 4.428µs
took on avg 4.885µs
took on avg 5.508µs
took on avg 6.004µs
took on avg 6.687µs
took on avg 7.088µs
took on avg 7.866µs
took on avg 8.381µs
took on avg 8.539µs
took on avg 9.438µs
took on avg 9.874µs
took on avg 10.114µs
took on avg 11.359µs
took on avg 10.506µs
took on avg 12.499µs
took on avg 13.506µs
took on avg 12.489µs
took on avg 13.801µs
took on avg 14.562µs
took on avg 16.435µs
took on avg 16.655µs
took on avg 16.774µs