nearcore icon indicating copy to clipboard operation
nearcore copied to clipboard

Runtime hits unknown perfomance cliff in the estimator

Open matklad opened this issue 3 years ago • 41 comments

See

https://github.com/matklad/nearcore/commit/80b44a877a1d2de76b4adcf63567984d615e7c36

In that commit, I add a benchmark which sends 150 blocks, where each block contains 100 deploy code transaction.

EDIT: https://github.com/near/nearcore/issues/4771#issuecomment-933347496 is modernized version.

The benchmark is implemented in two ways:

  • on top of parameter estimator testbed
  • on top of runtime tester

In the first case, I observe a very stage behavior, where at some point block processsing times severely degrades.

To reproduce, checkout this commit and run

 $ cargo test --release --package runtime-params-estimator --lib --features required – v2::fast --exact --nocapture
 $ cargo test --release --package runtime-params-estimator --lib --features required – v2::slow --exact --nocapture

Look for Runtime::apply lines in the output, they they tell you how long it takes to process one apply.

In the fast case, every apply takes about 16ms for me. In the slow case, apply start at about 30ms, but then abruptly go to 600ms.

Would be great to understand why the time suddenly balloons!

matklad avatar Sep 01 '21 14:09 matklad

Quick look indicates that the slowdown comes from the storage. I’ve added some more tracing and here’s output with a fast process_transaction invocation followed by a slow one (they are within the same apply call):

147.38µs         Runtime::process_transaction
  145.70µs         verify_and_charge_transaction
    48.12µs          validate_transaction
    46.70µs          get_account
      46.15µs          TrieUpdate::get
    49.21µs          get_access_key
      48.65µs          TrieUpdate::get
…
2.64ms           Runtime::process_transaction
  2.64ms           verify_and_charge_transaction
    49.03µs          validate_transaction
    1.30ms           get_account
      1.30ms           TrieUpdate::get
    1.29ms           get_access_key
      1.29ms           TrieUpdate::get

TrieUpdate::get is suddenly 25 times slower and it’s called quite a bit so it adds up.

mina86 avatar Sep 06 '21 15:09 mina86

@mina86 is it because the value is not cached and you have to do a trie traversal?

bowenwang1996 avatar Sep 06 '21 17:09 bowenwang1996

It doesn’t look like cache issues. I haven’t measured cache hit ratio or counted how often we hit the storage, but just by looking at logs the numbers seem similar for fast and slow cases. On the other hand, Store::get becomes ten times slower all of a sudden:

      74.73µs          Trie::Update::get
        68.40µs          Trie::lookup
          7.45µs           Store::get
          3.50µs           Store::get
          3.16µs           Store::get
          2.10µs           Store::get
          2.82µs           Store::get
          2.79µs           Store::get
          3.38µs           Store::get
          4.09µs           Store::get
          3.77µs           Store::get
          3.07µs           Store::get
          2.45µs           Store::get
          3.23µs           Store::get
          2.91µs           Store::get
          2.31µs           Store::get
          2.42µs           Store::get
          2.63µs           Store::get
          2.59µs           Store::get
        4.83µs           Store::get

vs.

      1.31ms           Trie::Update::get
        1.24ms           Trie::lookup
          86.48µs          Store::get
          72.27µs          Store::get
          70.31µs          Store::get
          69.17µs          Store::get
          68.85µs          Store::get
          68.40µs          Store::get
          68.75µs          Store::get
          69.13µs          Store::get
          69.77µs          Store::get
          68.91µs          Store::get
          68.82µs          Store::get
          69.21µs          Store::get
          68.84µs          Store::get
          78.35µs          Store::get
          75.67µs          Store::get
          76.41µs          Store::get
          75.47µs          Store::get
        69.45µs          Store::get

So I’m now looking at that.

mina86 avatar Sep 06 '21 18:09 mina86

The difference is caused by how signer of the transaction is chosen. In the fast test signers are chosen in sequence from all the available accounts. In the slow tests, signers are chosen at random from unused accounts. Making the latter use sequential accounts fixes the issue:

--- a/runtime/runtime-params-estimator/src/v2/support.rs
+++ b/runtime/runtime-params-estimator/src/v2/support.rs
@@ -202,12 +202,12 @@ impl<'a, 'c> TransactionBuilder<'a, 'c> {
         self.testbed.accounts[account_index].clone()
     }
     pub(crate) fn random_unused_account(&mut self) -> AccountId {
-        loop {
-            let account = self.random_account();
+        for account in &self.testbed.accounts {
             if self.testbed.used_accounts.insert(account.clone()) {
-                return account;
+                return account.clone();
             }
         }
+        panic!("Not enough accounts");
     }
     pub(crate) fn random_account_pair(&mut self) -> (AccountId, AccountId) {
         let first = self.random_account();

I didn’t look into how exactly this affects the database to cause the slowdown.

mina86 avatar Sep 06 '21 22:09 mina86

Perhaps I’ve spoken too soon. The above is correct and changing random_unused_account implementation does speed up the slow test but the opposite does not slow down the fast test:

--- a/runtime/runtime-params-estimator/src/v2.rs
+++ b/runtime/runtime-params-estimator/src/v2.rs
@@ -449,10 +449,20 @@ fn fast() {
         use_in_memory_store: false,
     };

-    let mut i = 0;
+    let mut used = std::collections::HashSet::new();
+    let mut rng = rand::thread_rng();
+
     for h in 0..n_blocks {
         let mut block = BlockConfig::at_height((h + 1) as u64);
         for _ in 0..block_size {
+
+            let i = loop {
+                let i = rng.gen_range(0, num_accounts as u32);
+                if used.insert(i) {
+                    break i as usize;
+                }
+            };
+
             let signer_id = accounts[i].clone();
             let receiver_id = signer_id.clone();
             let signer =
@@ -464,7 +474,6 @@ fn fast() {
                 signer,
                 actions: vec![Action::DeployContract(DeployContractAction { code: code.clone() })],
             });
-            i += 1;
         }
         s.blocks.push(block)
     }

It looks like it’s something else calling random_unused_account that is the culprit.

mina86 avatar Sep 06 '21 22:09 mina86

Confirmed that the order of accounts matters here. Iterating accounts by one up or down finished fast. Iterting accounts by eleven shows the same slowdown

diff
diff --git a/runtime/runtime-params-estimator/src/v2.rs b/runtime/runtime-params-estimator/src/v2.rs
index 6a844fcd9..597da1c01 100644
--- a/runtime/runtime-params-estimator/src/v2.rs
+++ b/runtime/runtime-params-estimator/src/v2.rs
@@ -27,7 +27,7 @@ static ALL_COSTS: &[(Cost, fn(&mut Ctx) -> GasCost)] = &[
     (Cost::ActionReceiptCreation, action_receipt_creation),
     (Cost::ActionSirReceiptCreation, action_sir_receipt_creation),
     (Cost::ActionTransfer, action_transfer),
-    (Cost::ActionCreateAccount, action_create_account),
+    // (Cost::ActionCreateAccount, action_create_account),
     (Cost::ActionDeleteAccount, action_delete_account),
     (Cost::ActionAddFullAccessKey, action_add_full_access_key),
     (Cost::ActionAddFunctionAccessKeyBase, action_add_function_access_key_base),
@@ -118,28 +118,28 @@ fn action_transfer(ctx: &mut Ctx) -> GasCost {
     total_cost - base_cost
 }
 
-fn action_create_account(ctx: &mut Ctx) -> GasCost {
-    let total_cost = {
-        let mut testbed = ctx.test_bed();
+// fn action_create_account(ctx: &mut Ctx) -> GasCost {
+//     let total_cost = {
+//         let mut testbed = ctx.test_bed();
 
-        let mut make_transaction = |mut tb: TransactionBuilder<'_, '_>| -> SignedTransaction {
-            let sender = tb.random_account();
-            let new_account =
-                AccountId::try_from(format!("{}_{}", sender, tb.rng().gen::<u64>())).unwrap();
-
-            let actions = vec![
-                Action::CreateAccount(CreateAccountAction {}),
-                Action::Transfer(TransferAction { deposit: 10u128.pow(26) }),
-            ];
-            tb.transaction_from_actions(sender, new_account, actions)
-        };
-        testbed.average_transaction_cost(&mut make_transaction)
-    };
+//         let mut make_transaction = |mut tb: TransactionBuilder<'_, '_>| -> SignedTransaction {
+//             let sender = tb.random_account();
+//             let new_account =
+//                 AccountId::try_from(format!("{}_{}", sender, tb.rng().gen::<u64>())).unwrap();
 
-    let base_cost = action_receipt_creation(ctx);
+//             let actions = vec![
+//                 Action::CreateAccount(CreateAccountAction {}),
+//                 Action::Transfer(TransferAction { deposit: 10u128.pow(26) }),
+//             ];
+//             tb.transaction_from_actions(sender, new_account, actions)
+//         };
+//         testbed.average_transaction_cost(&mut make_transaction)
+//     };
 
-    total_cost - base_cost
-}
+//     let base_cost = action_receipt_creation(ctx);
+
+//     total_cost - base_cost
+// }
 
 fn action_delete_account(ctx: &mut Ctx) -> GasCost {
     let total_cost = {
diff --git a/runtime/runtime-params-estimator/src/v2/support.rs b/runtime/runtime-params-estimator/src/v2/support.rs
index 6a635eb34..9b62e6fb2 100644
--- a/runtime/runtime-params-estimator/src/v2/support.rs
+++ b/runtime/runtime-params-estimator/src/v2/support.rs
@@ -7,7 +7,7 @@ use near_crypto::{InMemorySigner, KeyType};
 use near_primitives::hash::CryptoHash;
 use near_primitives::transaction::{Action, DeployContractAction, SignedTransaction};
 use near_primitives::types::{AccountId, Gas};
-use rand::prelude::ThreadRng;
+use rand::prelude::{SliceRandom, ThreadRng};
 use rand::Rng;
 
 use crate::cases::ratio_to_gas;
@@ -50,6 +50,7 @@ impl<'c> Ctx<'c> {
             accounts: (0..self.config.active_accounts).map(get_account_id).collect(),
             nonces: HashMap::new(),
             used_accounts: HashSet::new(),
+            idx: 0,
         }
     }
 
@@ -80,6 +81,7 @@ impl<'c> Ctx<'c> {
             accounts: proto.accounts.clone(),
             nonces: proto.nonces.clone(),
             used_accounts: HashSet::new(),
+            idx: 0,
         }
     }
 
@@ -117,6 +119,7 @@ pub(crate) struct TestBed<'c> {
     accounts: Vec<AccountId>,
     nonces: HashMap<AccountId, u64>,
     used_accounts: HashSet<AccountId>,
+    idx: usize,
 }
 
 impl<'c> TestBed<'c> {
@@ -190,24 +193,26 @@ impl<'a, 'c> TransactionBuilder<'a, 'c> {
         )
     }
 
-    pub(crate) fn rng(&mut self) -> ThreadRng {
-        rand::thread_rng()
-    }
+    // pub(crate) fn rng(&mut self) -> ThreadRng {
+    //     rand::thread_rng()
+    // }
 
     pub(crate) fn account(&mut self, account_index: usize) -> AccountId {
         get_account_id(account_index)
     }
     pub(crate) fn random_account(&mut self) -> AccountId {
-        let account_index = self.rng().gen_range(0, self.testbed.accounts.len());
+        self.testbed.idx = (self.testbed.idx + 11) % self.testbed.accounts.len();
+        let account_index = self.testbed.idx;
         self.testbed.accounts[account_index].clone()
     }
     pub(crate) fn random_unused_account(&mut self) -> AccountId {
-        loop {
+        for i in 0..1000 {
             let account = self.random_account();
             if self.testbed.used_accounts.insert(account.clone()) {
                 return account;
             }
         }
+        panic!("no unused accounts")
     }
     pub(crate) fn random_account_pair(&mut self) -> (AccountId, AccountId) {
         let first = self.random_account();

matklad avatar Sep 07 '21 13:09 matklad

Ok, it took me just about a week to realize I can throw flamegrapsh at it :sweat_smile:

stepping by 11 acconts:

image

stepping by 1:

image

profilnes recoreded with perf record -F 9999 --call-graph dwarf target/release/deps/runtime_params_estimator-0328c9c022713977 v2::slow --exact --nocapture and rendered with https://github.com/KDAB/hotspot

The rocksdb::VerifyBlockChecksum looks very suspicious!

I'll stop at this for the time being.

matklad avatar Sep 07 '21 14:09 matklad

(meta -- does anyone know of a nice way to share per profileing results? The above is me scrennshotting hotspot ui, which doesn's seem ideal)

matklad avatar Sep 07 '21 14:09 matklad

It’s still not clear to me why changing the fast code to use random accounts doesn’t slow it down though. There are some differences how the accounts are set up in both tests so that could be the explanation. In fast test test### accounts are added to genesis and transactions are executed using those. In slow test near_###_### accounts are added by the GenesisBuilder. Though it’s not clear to me why it would make a difference.

mina86 avatar Sep 07 '21 15:09 mina86

In the fast test signers are chosen in sequence from all the available accounts. In the slow tests, signers are chosen at random from unused accounts

@mina86 what are the available accounts? I wonder whether this affects the trie traversal in the sense that if we go over the accounts sequentially, some trie node will be cached between the reads (if not by nearcore then on the OS level the page may be cached) while accessing them randomly makes it harder to cache anything.

bowenwang1996 avatar Sep 07 '21 17:09 bowenwang1996

The tests create 200k accounts and all transactions are signed by one of them. The test uses each account only once. Sequential reads indeed seem like a plausible explanation though at this moment I think this is affecting caching in RocksDB or on OS level rather than in our code. (I still need to just add counters or profile each of the Trie::* and Store::* methods to be sure).

mina86 avatar Sep 07 '21 17:09 mina86

Turns out

diff --git a/genesis-tools/genesis-populate/src/lib.rs b/genesis-tools/genesis-populate/src/lib.rs
index d3a9793d..3eceddfa 100644
--- a/genesis-tools/genesis-populate/src/lib.rs
+++ b/genesis-tools/genesis-populate/src/lib.rs
@@ -29,7 +29,7 @@ use near_store::{
 use nearcore::{get_store_path, NightshadeRuntime};
 
 fn get_account_id(account_index: u64) -> AccountId {
-    AccountId::try_from(format!("near_{}_{}", account_index, account_index)).unwrap()
+    AccountId::try_from(format!("test{}", account_index)).unwrap()
 }
 
 pub type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
diff --git a/runtime/runtime-params-estimator/src/testbed_runners.rs b/runtime/runtime-params-estimator/src/testbed_runners.rs
index 2ede73d9..ab316ae6 100644
--- a/runtime/runtime-params-estimator/src/testbed_runners.rs
+++ b/runtime/runtime-params-estimator/src/testbed_runners.rs
@@ -18,7 +18,7 @@ use std::time::Instant;
 
 /// Get account id from its index.
 pub fn get_account_id(account_index: usize) -> AccountId {
-    AccountId::try_from(format!("near_{}_{}", account_index, account_index)).unwrap()
+    AccountId::try_from(format!("test{}", account_index)).unwrap()
 }
 
 /// Total number of transactions that we need to prepare.

makes the test go fast. The inverse (changing all instances of test{} to near_{}_{} in the code base) plus adding random order of accounts makes the fast test slow.

mina86 avatar Sep 17 '21 01:09 mina86

Excellent find! I confirm that the fast test starts going slowly with near_{}_{} accounts, so this removes any param-estimator related weirdness from the equation. I also had to replace test{} with near_{}_{} across the codebase, and not just in a test, which felt odd. Filed https://github.com/near/nearcore/issues/4835

matklad avatar Sep 17 '21 13:09 matklad

It feels that this is indeed related to the shape of the trie. @mina86 is it true that changing the names from test{} to near_{}_{} actually changes the number of database reads we do?

bowenwang1996 avatar Sep 17 '21 16:09 bowenwang1996

There’s only a small difference:

                             test{}      near_{}_{}
Trie::retrieve_raw_bytes   : 1832596  →  1944114
TrieCachingStorage         : 2118923  →  2232747
TrieCachingStorage (cached):       2  →        0
TrieCachingStorage (missed): 2118921  →  2232747
RocksDB::get               : 2322764  →  2436590

mina86 avatar Sep 20 '21 17:09 mina86

That is weird. Do we see any significant difference in the amount of time spent on RocksDB::get?

bowenwang1996 avatar Sep 21 '21 02:09 bowenwang1996

Modernized and simplified the repro based on mina86 refactors:

https://github.com/matklad/nearcore/commit/558b8e21142d14bcf9fd6fca16c03cabed6234b6

To run:

cargo test --release --package runtime-tester --lib \
    --features nearcore/no_cache \
    -- perf_cliff --exact --nocapture

I've also found a better workflow for perf profiles

1. Build the test binary

    cargo test --no-run --release --package runtime-tester --features nearcore/no_cache --lib -- perf_cliff --exact --nocapture

2. Run the binary directly with perf (you can see the name of the file by ommitting --no-run). -F is sampling frequency: higher number, means bigger size of the resulting profile file

    perf record -F 999 --call-graph dwarf,65528 -- target/release/deps/runtime_tester-160c2544f9ae8411 perf_cliff --exact --nocapture

3. The output of the previous command is perf.data file. You need to process it with perf script (I have no idea what this actually does)

    perf script -F +pid > slow.perf

4. Go to https://profiler.firefox.com and drag and drop the slow.perf file
5. Click "Upload local Profile" in the top-right corner
6. Share the URL with colleagues

Here's what I get for slow (near_{}_{}) and fast (test_{}) versions:

  • slow https://profiler.firefox.com/public/azmbf8q65nbrhdqj98752adjnkt18kj9c9pjpvr/flame-graph/?globalTrackOrder=0&localTrackOrderByPid=212964-0wxl&thread=1&v=6
  • fast https://profiler.firefox.com/public/vv11t42jzk8nw9fagjd8s62j6g0q9h5s4d7jk6g/flame-graph/?globalTrackOrder=0&localTrackOrderByPid=213518-0wxo&thread=1&v=6

Note that profile is per thread, and we have a lot of threads. Select perf_cliff thread at the top and Flame Graph view at the bottom to get the most amusing result.

matklad avatar Oct 04 '21 10:10 matklad

@matklad so it appears to be something inside RocksDB? cc @mina86

bowenwang1996 avatar Oct 04 '21 21:10 bowenwang1996

It appears yeah. What I don't understand is why Rocksdb stacks are hanging in the air. It looks as if perf just cant piece shared Rust/C++ stack together

image

image

Note how Scenario::run in both cases take approximately the same 5k amount of samples, but, in the first case, there's just this Rocks stuff out of nowhere.

So I guess next step here is to get an actually useful profile -- we need to understand who calls rocks.

matklad avatar Oct 06 '21 10:10 matklad

Aha, the stack issue is just becaues by default pref has a limit on stack depth. THe following argument fixes that: --call-graph dwarf,65528

Here's much better profiles

  • slow (near_{}_{}) https://share.firefox.dev/2WJxwls
  • fast (test_{}) https://share.firefox.dev/3mswaVh

matklad avatar Oct 06 '21 10:10 matklad

Yeah, looking at those profiles gives so much more to work with. The difference seems to be that in the rocksdb::Version::Get the slow version does everything that the fast version does, plus rocksdb::TableCache::Get. I'll stop my investigation here and go back to wasm stuff.

matklad avatar Oct 06 '21 11:10 matklad

@mina86 please take a look at the new profiling results :)

bowenwang1996 avatar Oct 06 '21 11:10 bowenwang1996

The difference seems to be that in the rocksdb::Version::Get the slow version does everything that the fast version does, plus rocksdb::TableCache::Get.

Yep, this is exactly true. RocksDB has this piece of code in Version::Get:

FilePicker fp(
    storage_info_.files_, user_key, ikey, &storage_info_.level_files_brief_,
    storage_info_.num_non_empty_levels_, &storage_info_.file_indexer_,
    user_comparator(), internal_comparator());
FdWithKeyRange* f = fp.GetNextFile();

while (f != nullptr) {
  // ...
}

and in the fast path f is always null. What FilePicker is I’m not sure at this point, but it’s something more than listing files in the storage directory.

I’ve tried looking at rocksdb.block-cache-usage and rocksdb.block-cache-pinned-usage counters but they remain constant thorough the test and way below rocksdb.block-cache-capacity.

Sadly, the summary is that I’m not much closer to figuring out where the performance drop comes from.

mina86 avatar Oct 12 '21 00:10 mina86

This now I think blocks progress on https://github.com/near/nearcore/pull/4733. I am investigating many-orders-of-magnitude differences between the old estimator and the new one, and it seems that I hit "DB is mysteriously slow" case.

The slow case is what is hit by the old estimator, and it looks like this:

    1.78ms           Runtime::process_receipt
      1.78ms           Runtime::apply_action_receipt
        1.76ms           Runtime::apply_action
          1.76ms           Runtime::action_function_call
            1.76ms           Runtime::execute_function_call
              1.62ms           Runtime::get_code
                110.87µs         retrieve_raw_bytes
                104.19µs         retrieve_raw_bytes
                105.77µs         retrieve_raw_bytes
                105.73µs         retrieve_raw_bytes
                106.31µs         retrieve_raw_bytes
                104.89µs         retrieve_raw_bytes
                104.48µs         retrieve_raw_bytes
                104.49µs         retrieve_raw_bytes
                105.22µs         retrieve_raw_bytes
                114.13µs         retrieve_raw_bytes
                104.68µs         retrieve_raw_bytes
                110.49µs         retrieve_raw_bytes
                110.93µs         retrieve_raw_bytes
                110.50µs         retrieve_raw_bytes
                105.30µs         retrieve_raw_bytes
              138.43µs         Runtime::execute_function_call/1
                137.73µs         Runtime::execute_function_call/2
                  137.15µs         Runtime::execute_function_call/3
                    136.50µs         run_vm

It takes 1.5ms to fetch contract code from the database, which doesn't make any sense. The fast case looks like this:

    210.09µs         Runtime::process_receipt
      209.44µs         Runtime::apply_action_receipt
        195.90µs         Runtime::apply_action
          195.07µs         Runtime::action_function_call
            192.66µs         Runtime::execute_function_call
              48.14µs          Runtime::get_code
                4.04µs           retrieve_raw_bytes
                1.48µs           retrieve_raw_bytes
                1.89µs           retrieve_raw_bytes
                1.87µs           retrieve_raw_bytes
                2.56µs           retrieve_raw_bytes
                1.83µs           retrieve_raw_bytes
                2.08µs           retrieve_raw_bytes
                2.90µs           retrieve_raw_bytes
                4.85µs           retrieve_raw_bytes
                3.55µs           retrieve_raw_bytes
                2.91µs           retrieve_raw_bytes
                2.01µs           retrieve_raw_bytes
                2.28µs           retrieve_raw_bytes
                2.93µs           retrieve_raw_bytes
                3.64µs           retrieve_raw_bytes
              143.66µs         Runtime::execute_function_call/1
                142.99µs         Runtime::execute_function_call/2
                  142.30µs         Runtime::execute_function_call/3
                    141.54µs         run_vm

matklad avatar Nov 10 '21 17:11 matklad

What if rocksdb issues list is helpful here? This one is quite curious: https://github.com/facebook/rocksdb/issues/3961

I use command cargo test --release --package runtime-tester --lib --features nearcore/no_cache -- perf_cliff --exact --nocapture.

When I disable set_cache_index_and_filter_blocks - https://github.com/near/nearcore/compare/558b8e211..bceefb56 - performance become much more stable:

  • disabled: 40-80ms to process a single block in most cases, one spike with 120ms
  • enabled: 40-60ms, then 1s for each block

@ailisp @matklad do you know why we enable it?

Longarithm avatar Nov 12 '21 23:11 Longarithm

@Longarithm We can try to see if it makes difference to try to adjust cache sizes per column. Like in https://github.com/near/nearcore/pull/5212

pmnoxx avatar Nov 13 '21 00:11 pmnoxx

have a lot of threads. Select perf_cliff thread at the top and Flame Graph view at the bottom to get the most amusing result.

It appears that int the slow path we spend a lot of time int rocksdb::BlockFetcher::ReadBlockContents. So RocksDB is spending a lot of time reading blocks in the slow case.

pmnoxx avatar Nov 13 '21 00:11 pmnoxx

retrieve_raw_bytes

I see that we have a cache in front of the database. This explains the spikes in performance. If you want to debug the issue, please try commenting this code out:

impl TrieStorage for TrieCachingStorage {
    fn retrieve_raw_bytes(&self, hash: &CryptoHash) -> Result<Vec<u8>, StorageError> {
        let mut guard = self.cache.0.lock().expect(POISONED_LOCK_ERR);
        /*    if let Some(val) = guard.cache_get(hash) {
            Ok(val.clone())
        } else {*/
        let key = Self::get_key_from_shard_uid_and_hash(self.shard_uid, hash);
        let val = self
            .store
            .get(ColState, key.as_ref())
            .map_err(|_| StorageError::StorageInternalError)?;
        if let Some(val) = val {
            if val.len() < TRIE_LIMIT_CACHED_VALUE_SIZE {
                guard.cache_set(*hash, val.clone());
            }
            Ok(val)
        } else {
            // not StorageError::TrieNodeMissing because it's only for TrieMemoryPartialStorage
            Err(StorageError::StorageInconsistentState("Trie node missing".to_string()))
        }
        //     }
    }

    fn as_caching_storage(&self) -> Option<&TrieCachingStorage> {
        Some(self)
    }
}

This makes reads slower, but this shows you the real time it takes to access the database.

  462.38ms         Process block
    461.43ms         process_state_update
      461.42ms         Runtime::apply
        66.14ms   100    Runtime::process_receipt
          100.10µs  100    get_key
        206.69ms  100    Runtime::process_transaction

  458.03ms         Process block
    457.15ms         process_state_update
      457.14ms         Runtime::apply
        67.32ms   100    Runtime::process_receipt
          94.68µs   100    get_key
        204.97ms  100    Runtime::process_transaction

  508.51ms         Process block
    507.50ms         process_state_update
      507.49ms         Runtime::apply
        68.10ms   100    Runtime::process_receipt
          97.73µs   100    get_key
        204.74ms  100    Runtime::process_transaction

  469.07ms         Process block
    468.15ms         process_state_update
      468.14ms         Runtime::apply
        68.92ms   100    Runtime::process_receipt
          100.43µs  100    get_key
        211.39ms  100    Runtime::process_transaction

  459.90ms         Process block
    459.00ms         process_state_update
      458.99ms         Runtime::apply
        67.22ms   100    Runtime::process_receipt
          95.58µs   100    get_key
        205.08ms  100    Runtime::process_transaction

  464.87ms         Process block
    463.96ms         process_state_update
      463.96ms         Runtime::apply
        68.66ms   100    Runtime::process_receipt
          97.19µs   100    get_key
        207.97ms  100    Runtime::process_transaction

For purpose of debugging database performance issue. I recommend disabling this cache, and debugging only the database access times. It's much easier to analyze performance issues by analyzing one thing at a time.

pmnoxx avatar Nov 13 '21 00:11 pmnoxx

I created a benchmark https://github.com/near/nearcore/pull/5312

It may be useful to try different database configuration.

pmnoxx avatar Nov 17 '21 07:11 pmnoxx

The database is becoming slower and slower as number of keys grows

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

     Running unittests (/home/pmnoxx/repos/nearcore.master/target/release/deps/ibf-b2f9e4ff4ef48925)

running 1 test
test benchmark_db ... took on avg 7.957µs
test benchmark_db ... took on avg 882ns                                                                                                                                                                                                        
took on avg 1.546µs                                                                                                                                                                                                                            
took on avg 1.647µs                                                                                                                                                                                                                            
took on avg 2.184µs                                                                                                                                                                                                                            
took on avg 2.716µs                                                                                                                                                                                                                            
took on avg 2.26µs
took on avg 3.401µs
took on avg 3.845µs
took on avg 4.428µs
took on avg 4.885µs
took on avg 5.508µs
took on avg 6.004µs
took on avg 6.687µs
took on avg 7.088µs
took on avg 7.866µs
took on avg 8.381µs
took on avg 8.539µs
took on avg 9.438µs
took on avg 9.874µs
took on avg 10.114µs
took on avg 11.359µs
took on avg 10.506µs
took on avg 12.499µs
took on avg 13.506µs
took on avg 12.489µs
took on avg 13.801µs
took on avg 14.562µs
took on avg 16.435µs
took on avg 16.655µs
took on avg 16.774µs


pmnoxx avatar Nov 17 '21 07:11 pmnoxx