nearcore icon indicating copy to clipboard operation
nearcore copied to clipboard

Storage Error: StorageInternalError

Open andrewklau opened this issue 2 years ago • 18 comments

I'm getting this when Storage Error: StorageInternalError error after running the indexer, then it stops indexing. This has only came up today on recent blocks.

Restarting the indexer solves the issue. Has occurred twice now. Plenty of disk space, cpu and memory still available.

Apr 19 13:02:10.046 INFO indexer_for_explorer: Block height 63858417 Apr 19 13:02:10.052 ERROR indexer_for_explorer: Error occurred during DatabaseError(ForeignKeyViolation, "insert or update on table "receipts" violates foreign key constraint "tx_receipt_fk""): "Receipts were stored in database" [ Receipt

{ receipt_id: "FgXgsSrNJiRarVwQLDe1AVVp8C1mNi5Q8pLtSVKPhpci", included_in_block_hash: "5UZvFYudotieeiq6MQGLRXHdb4kiJsnbP37wpaLd8zvd", included_in_chunk_hash: "5YK1HDrySW6MwXvYavQEq4kKqmmHqW1o38zrgyWP1Av6", index_in_chunk: 0, included_in_block_timestamp: BigDecimal("1650373326836992552"), predecessor_account_id: "hongtuoi87.near", receiver_account_id: "hongtuoi87.near", receipt_kind: Action, originated_from_transaction_hash: "GpcJ9oxpGuE8hKFa7NMFEAN5ZvUBss7bEF7YqWZcZ27c", }

, Receipt

{ receipt_id: "8Xenoi1jpUf9uZwC4DWsyQ68dMWxLcxUyMpe9PJbwcoC", included_in_block_hash: "5UZvFYudotieeiq6MQGLRXHdb4kiJsnbP37wpaLd8zvd", included_in_chunk_hash: "5YK1HDrySW6MwXvYavQEq4kKqmmHqW1o38zrgyWP1Av6", index_in_chunk: 1, included_in_block_timestamp: BigDecimal("1650373326836992552"), predecessor_account_id: "dangthevinh.near", receiver_account_id: "v2.ref-finance.near", receipt_kind: Action, originated_from_transaction_hash: "2NeqAQ1rVVufc5BGAfh383DZZnE8jDPx5DZ9WxzDRgWs", }

, Receipt

{ receipt_id: "ESdzhR5Ag6rrz92e1FKshKaoWR9saUupV5cbEKYF77vV", included_in_block_hash: "5UZvFYudotieeiq6MQGLRXHdb4kiJsnbP37wpaLd8zvd", included_in_chunk_hash: "5YK1HDrySW6MwXvYavQEq4kKqmmHqW1o38zrgyWP1Av6", index_in_chunk: 2, included_in_block_timestamp: BigDecimal("1650373326836992552"), predecessor_account_id: "hungryapple.near", receiver_account_id: "aurora", receipt_kind: Action, originated_from_transaction_hash: "Gh3N5uXk59v2fryQGphF5uRSTP4maXwqAVjFmTfmy8Pc", }

, ] Retrying in 100 milliseconds... thread '' panicked at 'Storage Error: StorageInternalError Cause: Unknown', /usr/local/cargo/git/checkouts/nearcore-5bf7818cf2261fd0/4ac008b/nearcore/src/runtime/mod.rs:1376:21 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

andrewklau avatar Apr 19 '22 14:04 andrewklau

This is weird, could you please provide what version of Indexer for Explorer you are using, please?

khorolets avatar Apr 19 '22 14:04 khorolets

The most suspicious thing here is the commit hash of nearcore from the trace:

nearcore-5bf7818cf2261fd0

That commit hash belongs to an old version of nearcore (1.23.x).

Could you share your Cargo.toml with us, please?

khorolets avatar Apr 19 '22 14:04 khorolets

Here is the latest commit:

commit 14391b4a30c7082cedaa2717a0c2efaf9ebc0cdd (HEAD, tag: 0.10.14, origin/master, origin/HEAD, master)
Author: Bohdan Khorolets <[email protected]>
Date:   Thu Mar 17 16:08:48 2022 +0200

    fix: Fix broken find parent tx hash for receipt logic (#263)

    * fix: Fix broken find parent tx hash for receipt logic

    * bump the version and add changelog record

andrewklau avatar Apr 19 '22 15:04 andrewklau

Cargo.toml

[package]
name = "indexer-explorer"
version = "0.10.14"
authors = ["Near Inc <[email protected]>"]
edition = "2021"

[dependencies]
actix = "=0.11.0-beta.2"
actix-rt = "=2.2.0"  # remove it once actix is upgraded to 0.11+
actix-web = "=4.0.0-beta.6"
actix-http = "=3.0.0-beta.6"
actix-tls = "=3.0.0-beta.5"
actix_derive = "=0.6.0-beta.1"
anyhow = "1.0.51"
base64 = "0.11"
bigdecimal = "=0.1.0"
borsh = "0.7.1"
cached = "0.23.0"
chrono = "0.4.19"
clap = { version = "3.0.0-beta.5", features = ["color", "derive", "env"] }
diesel = { version = "1.4.7", features = ["postgres", "numeric", "serde_json"] }
# Using hacky diesel-derive-enum https://github.com/adwhit/diesel-derive-enum/issues/52
diesel-derive-enum = { git = "https://github.com/khorolets/diesel-derive-enum.git", branch = "lookup-hack", features = ["postgres"] }
dotenv = "0.15.0"
futures = "0.3.5"
hex = "0.4"
itertools = "0.10.3"
# syn version conflict, replace with crates.io version once released
near-sdk = { git = "https://github.com/near/near-sdk-rs", rev="03487c184d37b0382dd9bd41c57466acad58fc1f" }
num-traits = "0.2.11"
openssl-probe = { version = "0.1.2" }
r2d2 = "0.8.8"
serde = { version = "1", features = ["derive"] }
serde_json = "1.0.55"
tokio = { version = "1.1", features = ["sync", "time"] }
tokio-stream = { version = "0.1" }
tracing = "0.1.13"
tracing-subscriber = "0.2.4"
uint = { version = "0.8.3", default-features = false }

actix-diesel = { git = "https://github.com/frol/actix-diesel", branch = "actix-0.11-beta.2" }
near-indexer = { git = "https://github.com/near/nearcore", rev = "4ac008b4a0194ed816f33c1c3a6da5159e25cac1" }
near-crypto = { git = "https://github.com/near/nearcore", rev = "4ac008b4a0194ed816f33c1c3a6da5159e25cac1" }
near-client = { git = "https://github.com/near/nearcore", rev = "4ac008b4a0194ed816f33c1c3a6da5159e25cac1" }

andrewklau avatar Apr 19 '22 15:04 andrewklau

This is super weird. Any steps to reproduce it with a guarantee?

/cc @frol

khorolets avatar Apr 19 '22 15:04 khorolets

I haven't been able to reproduce it with a guarantee. For reference I'm using the Dockerfile

andrewklau avatar Apr 19 '22 15:04 andrewklau

@andrewklau it feels like your node data is corrupted in some way. The panic is inside nearcore and has nothing to do with the indexer, so I recommend you re-download a fresh data backup.

frol avatar Apr 20 '22 19:04 frol

I bumped the latest version, I noticed this in the logs but it didn't cause any crash and just continued:

Apr 29 10:03:04.849  INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=1041215 total_msg_received_count=209 max_max_record_num_messages_in_progress=0
thread 'actix-rt|system:0|arbiter:0' panicked at 'not implemented', /usr/local/cargo/git/checkouts/nearcore-5bf7818cf2261fd0/c7eaf26/core/store/src/trie/trie_storage.rs:114:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I'm running a 3rd archive node in parallel with a fresh backup restore to see if the error occurs

andrewklau avatar Apr 29 '22 10:04 andrewklau

I downloaded a new backup and seem to still get this error

May 01 12:48:59.236  INFO indexer_for_explorer: Block height 64678126
May 01 12:48:59.306  WARN indexer_for_explorer: Provided event log does not correspond to any of formats defined in NEP. Will ignore this event.
 Error("unknown variant `burrow`, expected `nep141` or `nep171`", line: 1, column: 20)
"EVENT_JSON:{\"standard\":\"burrow\",\"version\":\"1.0.0\",\"event\":\"withdraw_started\",\"data\":[{\"account_id\":\"maxwu.near\",\"amount\":\"5000000000000000000000000\",\"token_id\":\"wrap.near\"}]}"
thread '<unnamed>' panicked at 'Storage Error: StorageInternalError', /usr/local/cargo/git/checkouts/nearcore-5bf7818cf2261fd0/c7eaf26/nearcore/src/runtime/mod.rs:1404:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
May 01 12:49:31.234  INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=6079593 total_msg_received_count=8891 max_max_record_num_messages_in_progress=158

andrewklau avatar May 01 '22 17:05 andrewklau

Not implemented error was related to https://github.com/near/nearcore/issues/6726 and most likely says that either db is corrupted or nearcore wasn't upgraded to 1.26. But downloading backup should resolve the first issue.

Could you restart the node with flags RUST_LOG="vm=debug,runtime=debug,near_indexer=debug,indexer=debug" RUST_BACKTRACE=full? It's hard to say what Storage Error: StorageInternalError is related to.

cc @mm-near

Longarithm avatar May 02 '22 10:05 Longarithm

@Longarithm

  20:     0x560b7c37b14d - <unknown>
  21:     0x560b7ce06c63 - <unknown>
  22:     0x7f460e4956db - <unknown>
  23:     0x7f460dc1c61f - <unknown>
  24:                0x0 - <unknown>
ESC[2mMay 03 14:42:08.938ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=FvPU1fA2tQn3TeRHJdiRaxgeYcXfx5G
hgidyxa2gv74S node_counter=TrieNodesCount { db_reads: 4, mem_reads: 0 }ESC[1m}ESC[0m: runtime: signer_address Address(0xd2e924053ea27c0bd48366f753723ed4a1b46573)
total_writes_count 30
total_written_bytes 960
ESC[2mMay 03 14:42:08.938ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=FvPU1fA2tQn3TeRHJdiRaxgeYcXfx5G
hgidyxa2gv74S node_counter=TrieNodesCount { db_reads: 4, mem_reads: 0 }ESC[1m}ESC[0m: runtime: node_counter=TrieNodesCount { db_reads: 2039, mem_reads: 0 }
ESC[2mMay 03 14:42:08.938ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=HBA58mAwXpVzY6SCsDBNf24MHuGpe9U
Y9xJ4q5fD7pXk node_counter=TrieNodesCount { db_reads: 2039, mem_reads: 0 }ESC[1m}ESC[0m: runtime: Calling the contract at account aurora
ESC[2mMay 03 14:42:08.939ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=HBA58mAwXpVzY6SCsDBNf24MHuGpe9U
Y9xJ4q5fD7pXk node_counter=TrieNodesCount { db_reads: 2039, mem_reads: 0 }ESC[1m}ESC[0m: runtime: node_counter=TrieNodesCount { db_reads: 2057, mem_reads: 0 }
thread '<unnamed>' panicked at 'Storage Error: StorageInternalError', /usr/local/cargo/git/checkouts/nearcore-5bf7818cf2261fd0/c7eaf26/nearcore/src/runtime/mod.rs:1404:21
stack backtrace:
   0:     0x560b7cdfefdd - <unknown>
   1:     0x560b7ce2685c - <unknown>
   2:     0x560b7cdf7008 - <unknown>
   3:     0x560b7ce014b7 - <unknown>
   4:     0x560b7ce01180 - <unknown>

This is also from a fresh import, I just downloaded the latest backup and started running it for a few hours

andrewklau avatar May 03 '22 17:05 andrewklau

ESC[2mMay 03 14:42:08.895ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m: runtime: epoch height: 1273, epoch id: EpochId(`B34CLQFi6rVszQWhs73VZ3wdDG76m9iSJQjJ4VmDuu7w`), current_protocol_vers
ion: 52, is_first_block_of_version: false
ESC[2mMay 03 14:42:08.895ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m: runtime: epoch height: 1273, epoch id: EpochId(`B34CLQFi6rVszQWhs73VZ3wdDG76m9iSJQjJ4VmDuu7w`), current_protocol_vers
ion: 52, is_first_block_of_version: false
ESC[2mMay 03 14:42:08.895ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m: runtime: epoch height: 1273, epoch id: EpochId(`B34CLQFi6rVszQWhs73VZ3wdDG76m9iSJQjJ4VmDuu7w`), current_protocol_vers
ion: 52, is_first_block_of_version: false
ESC[2mMay 03 14:42:08.895ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m: runtime: epoch height: 1273, epoch id: EpochId(`B34CLQFi6rVszQWhs73VZ3wdDG76m9iSJQjJ4VmDuu7w`), current_protocol_vers
ion: 52, is_first_block_of_version: false
ESC[2mMay 03 14:42:08.895ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=FvPU1fA2tQn3TeRHJdiRaxgeYcXfx5G
hgidyxa2gv74S node_counter=TrieNodesCount { db_reads: 4, mem_reads: 0 }ESC[1m}ESC[0m: runtime: Calling the contract at account aurora
ESC[2mMay 03 14:42:08.895ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=EJd21YUYCoyP1ijVqHniS87NF46p6de
8Jjgtv5LuZtLm node_counter=TrieNodesCount { db_reads: 37, mem_reads: 0 }ESC[1m}ESC[0m: runtime: node_counter=TrieNodesCount { db_reads: 79, mem_reads: 0 }
ESC[2mMay 03 14:42:08.895ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=DGSqZNotuZTWzCvgbqYPAEXVTFX2Uow
5WWQjXumdK1iZ node_counter=TrieNodesCount { db_reads: 79, mem_reads: 0 }ESC[1m}ESC[0m: runtime: node_counter=TrieNodesCount { db_reads: 105, mem_reads: 0 }
ESC[2mMay 03 14:42:08.895ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=2D66cZrr5kSV8qHGwGg8SLiBgcrvJbG
oppUBAsLB17zd node_counter=TrieNodesCount { db_reads: 105, mem_reads: 0 }ESC[1m}ESC[0m: runtime: node_counter=TrieNodesCount { db_reads: 127, mem_reads: 0 }
ESC[2mMay 03 14:42:08.896ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=9YTVHEngHqCiENoYMVdQce6t4FS3W6M
6BFbrb8kyAWQ6 node_counter=TrieNodesCount { db_reads: 4, mem_reads: 0 }ESC[1m}ESC[0m: runtime: node_counter=TrieNodesCount { db_reads: 12, mem_reads: 0 }
thread '<unnamed>' panicked at 'Storage Error: StorageInternalError', /usr/local/cargo/git/checkouts/nearcore-5bf7818cf2261fd0/c7eaf26/nearcore/src/runtime/mod.rs:1404:21
stack backtrace:
   0: thread '<unnamed>' panicked at 'Storage Error: StorageInternalError', /usr/local/cargo/git/checkouts/nearcore-5bf7818cf2261fd0/c7eaf26/nearcore/src/runtime/mod.rs:1404:21
    0x560b7cdfefdd - <unknown>
   1:     0x560b7ce2685c - <unknown>
   2:     0x560b7cdf7008 - <unknown>
   3:     0x560b7ce014b7 - <unknown>
   4:     0x560b7ce01180 - <unknown>
   5:     0x560b7ce01c09 - <unknown>
   6:     0x560b7ce018f7 - <unknown>
   7:     0x560b7cdff4a4 - <unknown>
      8:     0x560b7ce01609 - <unknown>
   9:     0x560b7b112e23 - <unknown>
  10:     0x560b7b360428 - <unknown>
  11:     0x560b7b355a31 - <unknown>
  12:     0x560b7b35f4a1 - <unknown>
  13:     0x560b7c1d56d2 - <unknown>
  14:     0x560b7c2b45ae - <unknown>
  15:     0x560b7c2aae70 - <unknown>
  16:     0x560b7c2b4c55 - <unknown>
  17:     0x560b7c2aafc4 - <unknown>
  18:     0x560b7c2657c2 - <unknown>
  19:     0x560b7c23aed1 - <unknown>
  20:     0x560b7b0eb0ff - <unknown>
  21:     0x560b7c37cfb9 - <unknown>
  22:     0x560b7c37f651 - <unknown>
  23:     0x560b7c37b14d - <unknown>
  24:     0x560b7ce06c63 - <unknown>
  25:     0x7f460e4956db - <unknown>
  26:     0x7f460dc1c61f - <unknown>
  27:                0x0 - <unknown>
stack backtrace:
   0:     0x560b7cdfefdd - <unknown>
   1:     0x560b7ce2685c - <unknown>
   2:     0x560b7cdf7008 - <unknown>
   3:     0x560b7ce014b7 - <unknown>
   4:     0x560b7ce01180 - <unknown>
   5:     0x560b7ce01c09 - <unknown>
   6:     0x560b7ce018f7 - <unknown>
   7:     0x560b7cdff4a4 - <unknown>
   8:     0x560b7ce01609 - <unknown>
   9:     0x560b7b112e23 - <unknown>
   10:     0x560b7b360428 - <unknown>
  11:     0x560b7b355a31 - <unknown>
  12:     0x560b7b35f4a1 - <unknown>
  13:     0x560b7c1d56d2 - <unknown>
  14:     0x560b7c2b45ae - <unknown>
  15:     0x560b7c2aae70 - <unknown>
  16:     0x560b7c23abdb - <unknown>
  17:     0x560b7b0eb0ff - <unknown>
  18:     0x560b7c37cfb9 - <unknown>
  19:     0x560b7c37f651 - <unknown>
  20:     0x560b7c37b14d - <unknown>
  21:     0x560b7ce06c63 - <unknown>
  22:     0x7f460e4956db - <unknown>
  23:     0x7f460dc1c61f - <unknown>
  24:                0x0 - <unknown>
ESC[2mMay 03 14:42:08.938ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=FvPU1fA2tQn3TeRHJdiRaxgeYcXfx5G
hgidyxa2gv74S node_counter=TrieNodesCount { db_reads: 4, mem_reads: 0 }ESC[1m}ESC[0m: runtime: signer_address Address(0xd2e924053ea27c0bd48366f753723ed4a1b46573)
total_writes_count 30
total_written_bytes 960
ESC[2mMay 03 14:42:08.938ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=FvPU1fA2tQn3TeRHJdiRaxgeYcXfx5G
hgidyxa2gv74S node_counter=TrieNodesCount { db_reads: 4, mem_reads: 0 }ESC[1m}ESC[0m: runtime: node_counter=TrieNodesCount { db_reads: 2039, mem_reads: 0 }
ESC[2mMay 03 14:42:08.938ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=HBA58mAwXpVzY6SCsDBNf24MHuGpe9U
Y9xJ4q5fD7pXk node_counter=TrieNodesCount { db_reads: 2039, mem_reads: 0 }ESC[1m}ESC[0m: runtime: Calling the contract at account aurora
ESC[2mMay 03 14:42:08.939ESC[0m ESC[34mDEBUGESC[0m ESC[1mprocess_state_updateESC[0m:ESC[1mRuntime::applyESC[0m:ESC[1mRuntime::process_receiptESC[0mESC[1m{ESC[0mreceipt_id=HBA58mAwXpVzY6SCsDBNf24MHuGpe9U
Y9xJ4q5fD7pXk node_counter=TrieNodesCount { db_reads: 2039, mem_reads: 0 }ESC[1m}ESC[0m: runtime: node_counter=TrieNodesCount { db_reads: 2057, mem_reads: 0 }
thread '<unnamed>' panicked at 'Storage Error: StorageInternalError', /usr/local/cargo/git/checkouts/nearcore-5bf7818cf2261fd0/c7eaf26/nearcore/src/runtime/mod.rs:1404:21
stack backtrace:
   0:     0x560b7cdfefdd - <unknown>
   1:     0x560b7ce2685c - <unknown>
   2:     0x560b7cdf7008 - <unknown>
   3:     0x560b7ce014b7 - <unknown>
   4:     0x560b7ce01180 - <unknown>
   5:     0x560b7ce01c09 - <unknown>
   6:     0x560b7ce018f7 - <unknown>
   7:     0x560b7cdff4a4 - <unknown>
   8:     0x560b7ce01609 - <unknown>
   9:     0x560b7b112e23 - <unknown>
  10:     0x560b7b360428 - <unknown>
  11:     0x560b7b355a31 - <unknown>
  11:     0x560b7b355a31 - <unknown>
  12:     0x560b7b35f4a1 - <unknown>
  13:     0x560b7c1d56d2 - <unknown>
  14:     0x560b7c2b45ae - <unknown>
  15:     0x560b7c2aae70 - <unknown>
  16:     0x560b7c23abdb - <unknown>
  17:     0x560b7b0eb0ff - <unknown>
  18:     0x560b7c37cfb9 - <unknown>
  19:     0x560b7c37f651 - <unknown>
  20:     0x560b7c37b14d - <unknown>
  21:     0x560b7ce06c63 - <unknown>
  22:     0x7f460e4956db - <unknown>
  23:     0x7f460dc1c61f - <unknown>
  24:                0x0 - <unknown>
ESC[2mMay 03 14:42:09.293ESC[0m ESC[34mDEBUGESC[0m indexer: Streaming is about to start from block #64814374 and the latest block is #64814373
ESC[2mMay 03 14:42:09.795ESC[0m ESC[34mDEBUGESC[0m indexer: Streaming is about to start from block #64814374 and the latest block is #64814373
ESC[2mMay 03 14:42:10.297ESC[0m ESC[34mDEBUGESC[0m indexer: Streaming is about to start from block #64814374 and the latest block is #64814373
ESC[2mMay 03 14:42:10.797ESC[0m ESC[34mDEBUGESC[0m indexer: Streaming is about to start from block #64814374 and the latest block is #64814373

andrewklau avatar May 03 '22 17:05 andrewklau

@Longarithm I moved the issue over to nearcore repo, so you can track it here

@mm-near I am not sure who I should escalate this issue to, so chain team can allocate the resources to look into this problem as it might potentially hit us with the release on mainnet. Could chain team at least double-check if we hit the same issue if start from a fresh backup on mainnet?

frol avatar May 03 '22 19:05 frol

I noticed it complain about "too many files open" error but it would keep going.

I increased the max file open limit and will update again if this issue still persists after that.

andrewklau avatar May 04 '22 05:05 andrewklau

ack - trying to run the node from mainnet now.

mm-near avatar May 05 '22 10:05 mm-near

my node seems to be working fine (at least for now).

@andrewklau - "too many files open" error - where do you see this error (as I didn't see it in your logs above?)

In this release, we did increase the number of files that we're trying to open (as it has a large impact on performance) ( you can see it in the changelog: Increase default max_open_files RocksDB parameter from 512 to 10k https://github.com/near/nearcore/pull/6607)

We are increasing the OS limit (using rlimit) - but maybe that didn't work on your machine.

Could you:

  • provide the part of the log, where the system complains about the 'too many files open'
  • see what is your open file limit
  • try lowering the limit that neard uses (this is controlled via config.json -> 'store' -> 'max_open_files')

mm-near avatar May 06 '22 09:05 mm-near

@mm-near

I think the issue might have been from the "too many files open".

I am running the nearcore inside a Docker image so rlimit would not have been able to increase the OS limit.

Since increasing the ulimit on files open there hasn't been any errors as of late. I will check back in the next 1-2 weeks if the error persists.

I don't have access to the logs anymore as they are lost since recreating the container.

andrewklau avatar May 08 '22 02:05 andrewklau

We are increasing the OS limit (using rlimit) - but maybe that didn't work on your machine.

Opening the database should have failed if we cannot set NOFILE to at least max_open_files+1025. Could it be that indexer bypasses ensure_max_open_files_limit somehow? Though I don’t see how that can happen. Indexer::run calls nearcore::start_with_config which is the same code path as a node.

mina86 avatar May 08 '22 15:05 mina86