mina icon indicating copy to clipboard operation
mina copied to clipboard

UMT-upgrade-mechanism-testing archive node does not finish loading genesis ledger

Open jrwashburn opened this issue 1 year ago • 0 comments

Preliminary Checks

  • [X] This issue is not a duplicate. Before opening a new issue, please search existing issues: https://github.com/MinaProtocol/mina/issues
  • [X] This issue is not a question, feature request, RFC, or anything other than a bug report. Please post those things in GitHub Discussions: https://github.com/MinaProtocol/mina/discussions

Description

Started archive node - several hours go by with no additional log messages after this:

Feb 23 15:12:22 nym3 systemd[735]: Started Mina Archive Service.
Feb 23 15:12:28 nym3 mina-archive[1308272]: {"timestamp":"2024-02-23 20:12:28.155412Z","level":"Info","source":{"module":"Genesis_ledger_helper","location":"File \"src/lib/genesis_ledger_helper/genesis_ledger_helper.ml\", line 820, characters 2-13"},"message":"Initializing with runtime configuration. Ledger name: $name","metadata":{"name":"UMT","pid":1308272}}
Feb 23 15:12:28 nym3 mina-archive[1308272]: {"timestamp":"2024-02-23 20:12:28.155521Z","level":"Info","source":{"module":"Genesis_ledger_helper","location":"File \"src/lib/genesis_ledger_helper/genesis_ledger_helper.ml\", line 842, characters 8-19"},"message":"Using the compiled constraint constants","metadata":{"pid":1308272}}
Feb 23 15:12:28 nym3 mina-archive[1308272]: {"timestamp":"2024-02-23 20:12:28.909294Z","level":"Info","source":{"module":"Genesis_ledger_helper","location":"File \"src/lib/genesis_ledger_helper/genesis_ledger_helper.ml\", line 890, characters 2-13"},"message":"Loaded genesis ledger from $ledger_file","metadata":{"ledger_file":"/tmp/coda_cache_dir/genesis_ledger_accounts_81d2056807c18d45d06f542530fcfd5113f313405827f6f6ea3e3c0025574a82.tar.gz","pid":1308272}}

During this time, mina-archive is not listening for connections. Database has a block 1, and daemon is complaining that it cannot connect to the archive.

I restarted the archive process after 2 hours, eventually removing --config-file parameter so that it would listen for the daemon and start getting blocks.

Later, I started a second instance of the archive process with the --config-file parameter and a different server port and let it run overnight. 6 hours in, it threw a deadlock error trying to insert public_keys for an account that is present.

2024-02-24 05:49:46 UTC [Info] Initializing with runtime configuration. Ledger name: "UMT"
2024-02-24 05:49:46 UTC [Info] Using the compiled constraint constants
2024-02-24 05:49:47 UTC [Info] Loaded genesis ledger from $ledger_file
        ledger_file: "/tmp/coda_cache_dir/genesis_ledger_accounts_81d2056807c18d45d06f542530fcfd5113f313405827f6f6ea3e3c0025574a82.tar.gz"
2024-02-24 10:48:23 UTC [Error] Failed to add genesis account: $account, see $error
        account: {
  "public_key": "B62qmATfKHQWCNoGafsmURNmDbaQd2prUNLQeF3G9J1jz2EbAwp2AeA",
  "token_id": "1",
  "token_permissions": [ "Not_owned", { "account_disabled": false } ],
  "balance": "1000000000000",
  "nonce": "0",
  "receipt_chain_hash":
    "2mzbV7WevxLuchs2dAMY4vQBS6XttnCUF8Hvks4XNBQ5qiSGGBQe",
  "delegate": "B62qmATfKHQWCNoGafsmURNmDbaQd2prUNLQeF3G9J1jz2EbAwp2AeA",
  "voting_for": "3NK2tkzqqK5spR2sZ7tujjqPksL45M3UUrcA4WhCkeiPtnugyE2x",
  "timing": [ "Untimed" ],
  "permissions": {
    "stake": true,
    "edit_state": [ "Signature" ],
    "send": [ "Signature" ],
    "receive": [ "None" ],
    "set_delegate": [ "Signature" ],
    "set_permissions": [ "Signature" ],
    "set_verification_key": [ "Signature" ]
  },
  "snapp": null
}
        error: "Request to <postgresql://user:db:port/umt-mainnet?sslmode=require> failed: ERROR:  deadlock detected\nDETAIL:  Process 1679555 waits for ShareLock on transaction 7504992; blocked by process 1529659.\nProcess 1529659 waits for ShareLock on transaction 7503864; blocked by process 1679555.\nHINT:  See server log for query details.\nCONTEXT:  while inserting index tuple (2294,3) in relation \"public_keys_value_key\"\n. Query: \"INSERT INTO public_keys (value) VALUES ($1) RETURNING id\"."
2024-02-24 10:48:23 UTC [Warn] Error in add_genesis_accounts : $error. Retrying...
        error: "Request to <postgresql://user:host:port/umt-mainnet?sslmode=require> failed: ERROR:  deadlock detected\nDETAIL:  Process 1679555 waits for ShareLock on transaction 7504992; blocked by process 1529659.\nProcess 1529659 waits for ShareLock on transaction 7503864; blocked by process 1679555.\nHINT:  See server log for query details.\nCONTEXT:  while inserting index tuple (2294,3) in relation \"public_keys_value_key\"\n. Query: \"INSERT INTO public_keys (value) VALUES ($1) RETURNING id\"."
2024-02-24 10:48:25 UTC [Info] Initializing with runtime configuration. Ledger name: "UMT"
2024-02-24 10:48:25 UTC [Info] Using the compiled constraint constants
2024-02-24 10:48:26 UTC [Info] Loaded genesis ledger from $ledger_file
        ledger_file: "/tmp/coda_cache_dir/genesis_ledger_accounts_81d2056807c18d45d06f542530fcfd5113f313405827f6f6ea3e3c0025574a82.tar.gz"

There are only 58 rows in the public_keys table currently, and there are several large id gaps on the public_keys table e.g. no keys with id after 1 until 110667, and several other large (and small) gaps.

Steps to Reproduce

  1. Create new database
  2. Run schema script
  3. Start archive node with
ExecStart=/usr/local/bin/mina-archive run \
 --postgres-uri $POSTGRES_URI \
 --server-port 3086 \
 --metrics-port 3099 \
 --log-json \
 --log-level DEBUG \
 --config-file /var/lib/coda/config_2025a732.json
  1. Stop archive node after two hours, restart without --config-file
  2. Separately, restart archive node with (so both are running)
mina-archive run --postgres-uri postgresql://user:pass@db:port/umt-mainnet?sslmode=require --config-file /var/lib/coda/config_2025a732.json --server-port 4086
  1. second instance died with deadlock reported after 5 hours; never starts listening for connections (so far.)

Expected Result

Archive will load genesis accounts and start listening for blocks

Actual Result

Archive never finishes loading genesis blocks.

Daemon version

Commit [DIRTY]2025a732df43c6735a576aff907252f74cf2ef35 on branch testing/hard-fork-internal

How frequently do you see this issue?

Always

What is the impact of this issue on your ability to run a node?

Blocker

Status

mina client status
Mina daemon status
-----------------------------------

Global number of accounts:                     201806
Block height:                                  59
Max observed block height:                     81
Max observed unvalidated block height:         81
Local uptime:                                  8h1m2s
Ledger Merkle root:                            jy1ZvdzScUkWJDnMKnhispuohRMViiX4M3NncS4LB6Ed7XH7jeT
Protocol state hash:                           3NKpZWHWGxq1bHC47ZX6h8HGoZE5YQaNpHGNv5jX5W3zZe2xawW2
Chain id:                                      100ef4660dd2f1dc184e5730cc21f5ecfbe2f7b24af237fe049ca425cea6633e
Git SHA-1:                                     [DIRTY]2025a732df43c6735a576aff907252f74cf2ef35
Configuration directory:                       /home/minar/.mina-config
Peers:                                         41
User_commands sent:                            0
SNARK worker:                                  None
SNARK work fee:                                100000000
Sync status:                                   Catchup
Catchup status:                                
        To build breadcrumb:           0
        To initial validate:           0
        Finished:                      70
        To download:                   0
        Waiting for parent to finish:  0
        To verify:                     0

Block producers running:                       0
Coinbase receiver:                             Block producer
Best tip consensus time:                       epoch=0, slot=489
Best tip global slot (across all hard-forks):  489
Consensus time now:                            epoch=0, slot=547
Consensus mechanism:                           proof_of_stake
Consensus configuration:                       
        Delta:                     0
        k:                         290
        Slots per epoch:           7140
        Slot duration:             3m
        Epoch duration:            14d21h
        Chain start timestamp:     2024-02-23 10:00:00.000000Z
        Acceptable network delay:  3m

Addresses and ports:                           
        External IP:    209.134.35.154
        Bind IP:        0.0.0.0
        Libp2p PeerID:  12D3KooWRkA6kt61EZhoTuKvM7rEWzzwRUMbNoSs9BtfYUpLqb8B
        Libp2p port:    8304
        Client port:    8301

Metrics:                                       
        block_production_delay:             7 (0 0 0 0 0 0 0)
        transaction_pool_diff_received:     132
        transaction_pool_diff_broadcasted:  0
        transactions_added_to_pool:         13
        transaction_pool_size:              3

Additional information

There are no log messages after the messages in this post, so not sharing logs. mina-archive seems to just be slowly busy doing something after it reports Loaded genesis ledger from $ledger_file.

jrwashburn avatar Feb 24 '24 13:02 jrwashburn