forest icon indicating copy to clipboard operation
forest copied to clipboard

Messages are getting stuck in mpool

Open elmattic opened this issue 2 years ago • 2 comments

Describe the bug

When running a Forest node on calibnet, the message pool (mpool) get slowly but steadily crowded with messages.

Forest's mpool is supposed to have a maximum size of 5000 messages to avoid DoS attacks, where nodes are spammed and run out of memory.

So if the node can get its whole mpool filled with those stuck messages this could lead Forest to stop processing messages quite simply.

However it turns out that we don't use this max_tx_pool_size or any other protection, exposing us to the aforementioned attack vector.

    /// This is max number of messages in the pool.
    pub max_tx_pool_size: i64,  // <- never used!

We need to address both issues.

To Reproduce

Steps to reproduce the behavior:

  1. Init a new Forest node with calibnet snapshot
  2. Wait for some time (one or two hours) and inspect mpool
  3. See console output

Console output

$ ./target/release/forest-cli mpool pending --cids && ./target/release/forest-cli mpool stat
bafy2bzaceb763nepfmpi5tdf5mejdk5eucoa3s7bbtpmqklshxbc5obfurhoq
bafy2bzacedjmjcotou2r45h4uhz53tqksi5rxyhlv3cc63sgq4rdoj5z5pqbw
bafy2bzaceaehvo2kmd4q3dn7n2l4eou4es2ywwxrp7b3wnw3knxaxwbpw6zlg
bafy2bzaceaxo6cvlmyigjqpscu7yutvkeydfawb7s7jpw7qdxdatf5nsxrvhm
bafy2bzaceaiqabqrfyy7vgbqyfe64iqvbgd7d42jao765zvxetdzouffii2pm
bafy2bzaceareywffz76km3c4be4mg7tev7kmbsujwoqclwzg2iqwudsoi4zi6
bafy2bzacecycefhboujsoigndcsrmoxvtx5kuiayoltwpmtsvqq3fslabvblm
bafy2bzacedoh6uux6iuc466nkfohtfbizmeoxmn26y54yyfstd3t4d2yfvad4
bafy2bzacedwpstszwjm2kbbvtcvjfky7vns6stpsjxabruuoqyz5fbbdd7f46
bafy2bzacecyz7yw6ewngd5bjhfnhywehtwcfa6o7rxs6loazgct6pxgrb2q3a
bafy2bzacecftla6tnomq4ktbe2333y6wavp2dao5sz3kb7hqa5nig5u36x5v4
bafy2bzaceask3ercyoun7abbxybibkceaavpgcrwey35kko4mw6jgyapzoacw
bafy2bzaceaaszuxovhjqip7ybuqwbybmsonx5ob6b6fbdjwn33yjyu52dg57w
bafy2bzacebaah7y7gq5rvrfyubyfcltw4pza6yup2so35wzh3imxwkfy2ri76
bafy2bzacecw4hcwwfolshqfl5nfov4n6uoaf23il5i5vs2gwais6xo6hz6eua
bafy2bzaceakdsshbrifsldjlo3po6o7iepyrsyqwqo6nz3en7i2iawiqpeiji
bafy2bzacecvuq5jcipz7j5d7tuwery2dcacwhhrfjzf33g4vto7qegx5sqa7a
bafy2bzaceazzd3fbwvkvlkn4opjq6reeut33fapiulgtjubwfjbifev2bjtia
bafy2bzacebl6hr4gvobstfdxhkk2jgtghryyi5v4eehnmv27wqq3vbmouuvcc
bafy2bzacealobnaccrr55jtgkk43yubkj7ix2tf4bvoft3xnn252vrzw5aa34
bafy2bzaced3x23qepryljfwt2phvszf77zzccheqvdlc3co6j46aunt2uvn7w
bafy2bzacedrghsadirxkmlppnslzljdkc6nb5rxx6rt4k2zkchl5y72xrv7yc
bafy2bzacea73h4yrh2jyapaob3tuc6nxakhzdn5asjuljpfy2ldob2x5ydzbq
bafy2bzacechqklmz76vrpinrijlvk5txbiwzrmp2amdmb3otjdgsbpvfhmvmc
bafy2bzacedtsokwtd4go4qsqt2wwt7hyw4nkh3bjlm4645cuc27n6bu7jqbcs
bafy2bzacedbv5ev7zr2res5bou7n5gegqgzpdtulry2nau4vc7r3buv5ixvwg
bafy2bzacebfnzytabebqzozje2wwa5zuqwl52i2ougup7swim7vy32ftmmxje
bafy2bzacebz3wqdajfygd6zhfucffigdviv5ybwsbzb33ukat6ccltwobj2yw
bafy2bzacedsmtfwzxhr2xdigewphlcqzip2iaprwvbrf2o7vdkukd4kfpq4gc
bafy2bzacecmu3wdtzsigzkuy6shbyzz6liugo7gbfdy6os4hhm6qygy2v3zeu
bafy2bzaceby46bct45yoyxpixd6iz5wdzyowkzunnh3pzgyh5tndfo42migya
bafy2bzacecr4g2s3b6es5k56j322t3n3sng6iee5bco6vjc7xsasxu7m63hgc
bafy2bzacec544d5j2w7dklcjdy3mfh5t4yu3ndm53qg675fk2lj5caikaih36
bafy2bzacebalr4b54vh5wgezh4tmcnftwbrxxz2yy5qzsgqbaa3xr3evypncw
t1u2htzdz77nhhrial6cfdlekcou5cbhtmtbrzoaq: Nonce past: 3, cur: 0, future: 0; FeeCap cur: 0, min-60: 0, gasLimit: 175728384
t1uiyoi7rdnubtxtnt5swn632fdb3zw36j5xuviwq: Nonce past: 1, cur: 0, future: 0; FeeCap cur: 0, min-60: 0, gasLimit: 930953
t3qeydz3j3ayvjvognxoatc3vahaquz2mph4donx4kdu2mfkgsb2ebq6mslrsabkkqwub2rgtcomc56hri4aga: Nonce past: 1, cur: 0, future: 0; FeeCap cur: 0, min-60: 0, gasLimit: 22589408
t3qn7ask2us7xhhi5yoclb3pmpjofrv7lynnzia6dgdvj5bjdztnoyzobrk3rfmcpinpi55hdxvyiany4knw3q: Nonce past: 10, cur: 0, future: 0; FeeCap cur: 0, min-60: 0, gasLimit: 836985198
t3qupitigb6rownwycx64jge6sa34mjou2rv3xihs2gv2dwp7l4rpmhb2dsxibpabuu2qrlnoobyvmmkysekcq: Nonce past: 4, cur: 0, future: 0; FeeCap cur: 0, min-60: 0, gasLimit: 337096276
t3qxetyfmkkumqfdfxugexhsdzmdnmhrj4yo26sg2633yxqsqlsmddkz5q2urhq7crpa33nqmg6axtwxm2mnvq: Nonce past: 1, cur: 0, future: 0; FeeCap cur: 0, min-60: 0, gasLimit: 25074707
t3s6t7fnk6cjiuyyqeexruzixsca2zbdrd237aewt322ehblslq3cdtwrsb5p3hykewgf3rhv3etola2dtmjwa: Nonce past: 2, cur: 0, future: 0; FeeCap cur: 0, min-60: 0, gasLimit: 79621365
t3tc7bgy4fpmlrqnb62rfoe27b6hxmzkxddvd4igjh56txxculjxkbfds3r46of7oo5qd4rhbxg2fdgb6za4ea: Nonce past: 0, cur: 0, future: 2; FeeCap cur: 0, min-60: 0, gasLimit: 44593869
t3u3ekmkxmnp7vwgc4eifoxztrv6olerla7tsjcpd6376ug2f2pl3hukjkxc754gzsnvuilmtzo3kzxb4mmmua: Nonce past: 1, cur: 0, future: 0; FeeCap cur: 0, min-60: 0, gasLimit: 25241553
t3umcpde3adqvf6edjuxvqw77zibtqn3kpl6u2ewyxvgbq6elcxwaz6u23hzrvsrrg423aep6ho5bje3drt26q: Nonce past: 3, cur: 0, future: 0; FeeCap cur: 0, min-60: 0, gasLimit: 77099350
t3vm6qoze7ajlin3e6x2smphspvfmtrnxi4qp6hq45ljmrdeojn7dini6o6rd2y2yluycsevtx2tm57xhmnafq: Nonce past: 4, cur: 0, future: 0; FeeCap cur: 0, min-60: 0, gasLimit: 284333025
t3wt6c4wla5egncjsgq67lsu4wzu4xtnbeskgupty7udysbiqkr4sw6inqli2nazks2ypwwnmlahtkzd4ghjja: Nonce past: 2, cur: 0, future: 0; FeeCap cur: 0, min-60: 0, gasLimit: 50654328
-----
total: Nonce past: 32, cur: 0, future: 2; FeeCap cur: 0, min-60: 0, gasLimit: 1959948416

A bit later:

...
total: Nonce past: 43, cur: 4, future: 3; FeeCap cur: 0, min-60: 0, gasLimit: 3218952668

Expected behaviour

  1. Messages should not get stuck
  2. We must implement spam protection

Environment

  • OS: macOS
  • Rust version: 1.70.0-nightly
  • Branch/commit: 6050cc6399cf1e50472ec7856cb0003e95b326ad

Other information and links

Lotus never had its mpool exceeding ~10 messages while Forest's one was constantly growing.

MessagePool

TODO comment

elmattic avatar Apr 13 '23 17:04 elmattic

Result of my initial investigation:

Consider following log of a Forest node on calibnet.

2023-05-04T11:32:30.067295Z  INFO forest_chain::store::chain_store: New heaviest tipset! [bafy2bzacedtn7fcy2fwy2oibegccu5bq6try5gzbtjjgwe6wrg4ir7mrk5mxi] (EPOCH = 529119)    
2023-05-04T11:32:30.067707Z  INFO forest_chain::store::chain_store: New heaviest tipset! [bafy2bzacedus57iegr2xpf7xbsirougy5ar7vu2e4b3jpa7mw4nhero5c3ypu, bafy2bzacedtn7fcy2fwy2oibegccu5bq6try5gzbtjjgwe6wrg4ir7mrk5mxi] (EPOCH = 529119)    
2023-05-04T11:32:30.067804Z  INFO forest_chain_sync::tipset_syncer: Successfully synced tipset range: [529118, 529119]    
2023-05-04T11:33:00.048036Z  INFO forest_blocks::tipset: Weight tie left unbroken, default to [bafy2bzacea5ras4kslg5hconbaonujcm6yoyi4m3e6l3egu6ocny4cbub7uya]    
2023-05-04T11:33:00.048620Z  WARN forest_chain_sync::tipset_syncer: Validating block [CID = bafy2bzaceaez7hf4lz4h7dkk2egdbt4pum6dxaf4xtgv5a6pjydtcbxbmxfik] in EPOCH = 529120 failed: Loading tipset parent from the store failed: Key for header not found    
2023-05-04T11:33:00.048646Z  WARN forest_chain_sync::tipset_syncer: Sync messages check state failed for single tipset    
2023-05-04T11:33:00.048659Z ERROR forest_chain_sync::tipset_syncer: Syncing tipset range [529119, 529120] failed: Loading tipset parent from the store failed: Key for header not found    
2023-05-04T11:33:01.507198Z  INFO forest_blocks::tipset: Weight tie broken in favour of [bafy2bzacecfrul5ri2ggwbgmrcmsfbfnc65nomcakileyok7hqbpxo6vyczmi]    
2023-05-04T11:33:01.507670Z  INFO forest_chain_sync::tipset_syncer: Validating tipset: EPOCH = 529120, N blocks = 1    
2023-05-04T11:33:01.507714Z  WARN forest_chain_sync::tipset_syncer: Validating block [CID = bafy2bzaced732r3eipj2fesqt7vt4iksthm2qujqqqd6lzl5hjt2mnaqaw43o] in EPOCH = 529120 failed: Loading tipset parent from the store failed: Key for header not found    
2023-05-04T11:33:01.507732Z  WARN forest_chain_sync::tipset_syncer: Sync messages check state failed for single tipset    
2023-05-04T11:33:01.507741Z ERROR forest_chain_sync::tipset_syncer: Syncing tipset range [529119, 529120] failed: Loading tipset parent from the store failed: Key for header not found    
2023-05-04T11:33:30.701583Z  WARN forest_chain_sync::tipset_syncer: Validating block [CID = bafy2bzacea3zaonwshrt5jbmqctyufamht3mqj3ndgjx2vsghr2t2p4ta4pvc] in EPOCH = 529121 failed: Loading tipset parent from the store failed: Key for header not found    
2023-05-04T11:33:30.701603Z  WARN forest_chain_sync::tipset_syncer: Sync messages check state failed for single tipset    
2023-05-04T11:33:30.701613Z ERROR forest_chain_sync::tipset_syncer: Syncing tipset range [529119, 529121] failed: Loading tipset parent from the store failed: Key for header not found    
2023-05-04T11:33:31.527452Z  INFO forest_blocks::tipset: Weight tie left unbroken, default to [bafy2bzaceckxdkiitpszrn6voswbcjq3ncqmz3ux4s5jhlbuiosr7rvcqsuqq]    
2023-05-04T11:33:31.527463Z  INFO forest_blocks::tipset: Weight tie broken in favour of [bafy2bzaceckxdkiitpszrn6voswbcjq3ncqmz3ux4s5jhlbuiosr7rvcqsuqq]    
2023-05-04T11:33:31.528125Z  INFO forest_chain_sync::tipset_syncer: Validating tipset: EPOCH = 529121, N blocks = 1    
2023-05-04T11:33:31.528142Z  INFO forest_chain_sync::tipset_syncer: Validating tipset: EPOCH = 529121, N blocks = 1    
2023-05-04T11:33:31.528159Z  WARN forest_chain_sync::tipset_syncer: Validating block [CID = bafy2bzacebzeufi7dkpsle6ys2hojn2cogxm7tajqeoqzgbkz3wcdu3j3gzi2] in EPOCH = 529121 failed: Loading tipset parent from the store failed: Key for header not found    
2023-05-04T11:33:31.528179Z  WARN forest_chain_sync::tipset_syncer: Sync messages check state failed for single tipset    
2023-05-04T11:33:31.528188Z ERROR forest_chain_sync::tipset_syncer: Syncing tipset range [529119, 529121] failed: Loading tipset parent from the store failed: Key for header not found    
2023-05-04T11:34:00.907991Z  INFO forest_chain_sync::tipset_syncer: Successfully synced tipset range: [529119, 529122]

The tipset found for the epoch 529119 is [bafy2bzacedus57iegr2xpf7xbsirougy5ar7vu2e4b3jpa7mw4nhero5c3ypu, bafy2bzacedtn7fcy2fwy2oibegccu5bq6try5gzbtjjgwe6wrg4ir7mrk5mxi]

A head change event is triggered with associated tipset:

https://github.com/ChainSafe/forest/blob/main/blockchain/message_pool/src/msgpool/msg_pool.rs#L239

Forest message pool can react to this event and remove all mined messages inside the two blocks of this tipset:

https://github.com/ChainSafe/forest/blob/main/blockchain/message_pool/src/msgpool/mod.rs#L244

(via calls to remove_from_selected_msgs)

Then later, Forest tries to validate tipset at epoch 529120:

Validating tipset: EPOCH = 529120, N blocks = 1

This fails (it's a legit error):

2023-05-04T11:33:01.507741Z ERROR forest_chain_sync::tipset_syncer: Syncing tipset range [529119, 529120] failed: Loading tipset parent from the store failed: Key for header not found
2023-05-04T11:33:00.048646Z  WARN forest_chain_sync::tipset_syncer: Sync messages check state failed for single tipset

But this also means the head change event for this tipset won't be triggered. The next time it will, will be for tipset at epoch 529122 (last line of the log).

As a result, messages mined in blocks at epoch 529120 and 529121 and that have reached this Forest node will never be erased and pile up in the associated MsgSet hash map until forest daemon is restarted.

I will investigate next how Lotus mpool implementation works and can deal will messages removal.

elmattic avatar May 04 '23 12:05 elmattic

Lotus doesn't have this kind of "single tipset" failure.

I think we should understand and fix this issue in the first place.

elmattic avatar May 04 '23 13:05 elmattic