nethermind icon indicating copy to clipboard operation
nethermind copied to clipboard

"Failed to notify enode" - root cause analysis of NullReferenceException

Open kamilchodola opened this issue 3 years ago • 18 comments

Describe the bug From time to time it happens that we have multiple failures on SmokeTests runs with "Failed to notify enode" and NullReferenceException. Need to investigate and fix a root-cause for that since it was not appearing most probably on pre-merge version.

This is a minor issue - no issues with syncing happens because of that. image

kamilchodola avatar Oct 06 '22 12:10 kamilchodola

@smartprogrammer93 @MarekM25 Any info there? It still appears from time to time on nodes and just wanted to ensure if this can make any issue on node.

kamilchodola avatar Oct 22 '22 09:10 kamilchodola

I will check on it tomorrow @kamilchodola

smartprogrammer93 avatar Oct 22 '22 09:10 smartprogrammer93

@smartprogrammer93 Great! Thanks

kamilchodola avatar Oct 22 '22 09:10 kamilchodola

https://seq.nethermind.io/#/events?filter=Contains(NodeName,%20'Smoke-Tests-Snap-144-FastSync-goerli')%20and%20not%20%22Big%20Snappy%20messag%22%20and%20not%20%22block%20producer%20%26%20sealer%22&from=2022-10-21T23:50:00.000Z&to=2022-10-22T00:00:00.000Z

Appeared there on goerli on current smoke tests

kamilchodola avatar Oct 22 '22 09:10 kamilchodola

Hey @kamilchodola ,

Let me know if you still see this issue from a version after https://github.com/NethermindEth/nethermind/pull/4874 is merged.

smartprogrammer93 avatar Nov 28 '22 14:11 smartprogrammer93

@smartprogrammer93 I can see that one again but only once in one node. https://seq.nethermind.io/#/events?filter=Contains(NodeName,%20'Smoke-Tests-master125v2-goerli-lighthouse')%20and%20%22failed%20to%22 image

kamilchodola avatar Dec 06 '22 09:12 kamilchodola

@smartprogrammer93 Got spammed with those messages again image Unfortunately I got spammed with Debug logs and app removed file with logs for this specific situation - but seems like it is getting a bit stronger now

kamilchodola avatar Dec 14 '22 20:12 kamilchodola

Unfortunately, i am still not sure of the reason behind this exception being thrown. I tried to investigate it intensively already but reached no solution. I will try to dive more into it tomorrow.

smartprogrammer93 avatar Dec 14 '22 20:12 smartprogrammer93

@smartprogrammer93 Will try to reproduce it or will be more careful about those logs and will try to catch debug logs for You 0 maybe that would help.

kamilchodola avatar Dec 14 '22 20:12 kamilchodola

@kamilchodola are we still seeing these?

smartprogrammer93 avatar Jan 18 '23 19:01 smartprogrammer93

@kamilchodola let me know if you are still seeing these exceptions or not. If not i will close this one.

smartprogrammer93 avatar Mar 24 '23 08:03 smartprogrammer93

It still happens from time to time especially on goerli nodes but very rarely... So except normal logs it is hard to catch debug logs becuase those may be already overriden.

kamilchodola avatar Mar 24 '23 12:03 kamilchodola

@smartprogrammer93 Just happened again... Wondering about problems it may cause for us or network in such case. This time it was on mainnet-lighthouse pair. image

kamilchodola avatar Aug 01 '23 08:08 kamilchodola

@kamilchodola same stack trace (exception details)?

smartprogrammer93 avatar Aug 01 '23 10:08 smartprogrammer93

@smartprogrammer93 Yeah - looks slightly different: image

kamilchodola avatar Aug 01 '23 12:08 kamilchodola

@smartprogrammer93 still happening

System.NullReferenceException: Object reference not set to an instance of an object.
   at Nethermind.Network.P2P.ProtocolHandlers.SyncPeerProtocolHandlerBase.TxsToSendAndMarkAsNotified(IEnumerable`1 txs, Boolean sendFullTx)+MoveNext() in /_/src/Nethermind/Nethermind.Network/P2P/ProtocolHandlers/SyncPeerProtocolHandlerBase.cs:line 229
   at Nethermind.Network.P2P.Subprotocols.Eth.V65.Eth65ProtocolHandler.SendNewTransactionsCore(IEnumerable`1 txs, Boolean sendFullTx) in /_/src/Nethermind/Nethermind.Network/P2P/Subprotocols/Eth/V65/Eth65ProtocolHandler.cs:line 167
   at Nethermind.TxPool.TxBroadcaster.Notify(ITxPoolPeer peer, IEnumerable`1 txs, Boolean sendFullTx) in /_/src/Nethermind/Nethermind.TxPool/TxBroadcaster.cs:line 310

MarekM25 avatar Jan 23 '24 08:01 MarekM25

Yup, @MarekM25 cant figure out the reason. Will try to investigate again once possible

smartprogrammer93 avatar Jan 23 '24 08:01 smartprogrammer93

i spend a couple more hours on this, cant find a way for a null Tx ref to reach this point of the code. I traced it back all the way to TxPool. Only solution i see (workaround) is to check if Tx is null before trying to read it's hash. @MarekM25 let me know if you want me to follow this approach, or any other possible action.

smartprogrammer93 avatar Jan 25 '24 06:01 smartprogrammer93