hydra
hydra copied to clipboard
Hydra node crashed after a fork
Context & versions
My hydra node was running a version around commit 979a5610
Steps to reproduce
I can't reproduce right now. Here are the steps that I followed:
- Start my node
- let it run for several days
- observer that it has crashed
Actual behavior
The node crashed with the following message:
hydra-node: QueryException "AcquireFailurePointNotOnChain"
Expected behavior
To not crash
Extra Information
Looking at previous logs, it might be the case that the layer 1 switch to a fork after hydra has taken some block into consideration. Then, querying layer 1 led to an exception. Here are some, probably significant logs:
# new tip taken into account by hydra-node:
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: [bc6522f8:cardano.node.ChainDB:Notice:66] [2022-10-14 15:49:41.10 UTC] Chain extended, new tip: 787b2e7bb45f5a77740c6679ca2a35332420eb14a795a1bedcff237126aa84ea at slot 5759381
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: {"timestamp":"2022-10-14T15:49:41.143911009Z","threadId":49,"namespace":"HydraNode-1","message":{"directChain":{"contents":{"newUTxO":{"3b54e9e04dd621d99b1b5c40d32d9d2c50f19c520ca37b788629d6bfce27a0f6#TxIx 0":{"address":"60edc8831194f67c170d90474a067347b167817ce04e8b6be991e828b5","datum":null,"referenceScript":null,"value":{"lovelace":100000000,"policies":{}}},"e9c45dd07cdafd96a399b52c9c73f5d9886bbd283b09b7eae054092e5b926b58#TxIx 1":{"address":"60edc8831194f67c170d90474a067347b167817ce04e8b6be991e828b5","datum":"a654fb60d21c1fed48db2c320aa6df9737ec0204c0ba53b9b94a09fb40e757f3","referenceScript":null,"value":{"lovelace":3654400,"policies":{}}}},"tag":"ApplyBlock"},"tag":"Wallet"},"tag":"DirectChain"}}
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: {"timestamp":"2022-10-14T15:49:41.143915519Z","threadId":49,"namespace":"HydraNode-1","message":{"directChain":{"point":"At (Block {blockPointSlot = SlotNo 5759381, blockPointHash = 787b2e7bb45f5a77740c6679ca2a35332420eb14a795a1bedcff237126aa84ea})","tag":"RolledForward"},"tag":"DirectChain"}}
# layer 1 swithes to a fork
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: [bc6522f8:cardano.node.ChainDB:Info:66] [2022-10-14 15:49:41.22 UTC] Block fits onto some fork: 8adda8ee035fb15aac00c0adbec662c2197cc9c1fbd4282486f6f6513c7700f5 at slot 5759381
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: [bc6522f8:cardano.node.ChainDB:Notice:66] [2022-10-14 15:49:41.22 UTC] Switched to a fork, new tip: 8adda8ee035fb15aac00c0adbec662c2197cc9c1fbd4282486f6f6513c7700f5 at slot 5759381
# ignoring some "closing connections" logs
# hydra node crashes with an exception:
Oct 14 15:49:41 ip-172-31-45-238 bc6522f81da8[34927]: hydra-node: QueryException "AcquireFailurePointNotOnChain"
The exception thrown, I think, would correspond to CardanoClient:295 in the current version of code.
@Quantumplation this might be similar to what you have seen? Any logs would be helpful.
Had a new crash last Friday. Was running hydra version 0.8.0. Not sure what happened and if it was really related to this same problem. I attach the logs here in case it could help.
Had a new crash last nigh. Was still running version 0.8.0. I attache the logs here in case it could help. I've noticed this message from the hydra node:
hydra-node: QueryAcquireException AcquireFailurePointNotOnChain
Again today. 2022-11-14.logs.txt
Might have been solved by #621, closing this until it re-appears.
Nodes are still crashing because of hydra-node: QueryAcquireException AFPointNotOnChain
Maybe the queryTimeHandle
could be the issue here. We are (still) requesting and rebuilding a time handle on each RollForward
, which could fail by querying at a specific point .. if we happen to switch the change just in between:
modified hydra-node/src/Hydra/Chain/Direct/TimeHandle.hs
@@ -16,7 +16,7 @@ import Hydra.Cardano.Api (
)
import Hydra.Cardano.Api.Prelude (ChainPoint (ChainPoint, ChainPointAtGenesis))
import Hydra.Chain.CardanoClient (
- QueryPoint (QueryAt),
+ QueryPoint (..),
queryEraHistory,
querySystemStart,
queryTip,
@@ -107,8 +107,8 @@ mkTimeHandle currentSlotNo systemStart eraHistory = do
queryTimeHandle :: NetworkId -> FilePath -> IO TimeHandle
queryTimeHandle networkId socketPath = do
tip <- queryTip networkId socketPath
- systemStart <- querySystemStart networkId socketPath (QueryAt tip)
- eraHistory <- queryEraHistory networkId socketPath (QueryAt tip)
+ systemStart <- querySystemStart networkId socketPath QueryTip
+ eraHistory <- queryEraHistory networkId socketPath QueryTip
currentTipSlot <-
case tip of
ChainPointAtGenesis -> pure $ SlotNo 0
{"timestamp":"2023-08-29T19:38:43.440470208Z","threadId":95,"namespace":"HydraNode-\"Dan\"","message":{"node":{"by":{"vkey":"0f23a3124d401d89e8f6cfb724eb00ac073c74edc93a4db9b0889c999f2415fe"},"event":{"chainEvent":{"chainSlot":37654723,"chainTime":"2023-08-29T19:38:43Z","tag":"Tick"},"tag":"OnChainEvent"},"eventId":622,"tag":"BeginEvent"},"tag":"Node"}}
{"timestamp":"2023-08-29T19:38:43.440489848Z","threadId":95,"namespace":"HydraNode-\"Dan\"","message":{"node":{"by":{"vkey":"0f23a3124d401d89e8f6cfb724eb00ac073c74edc93a4db9b0889c999f2415fe"},"outcome":{"contents":{"chainSlot":37654723,"tag":"TickObserved"},"tag":"StateChanged"},"tag":"LogicOutcome"},"tag":"Node"}}
{"timestamp":"2023-08-29T19:38:43.440635265Z","threadId":95,"namespace":"HydraNode-\"Dan\"","message":{"node":{"by":{"vkey":"0f23a3124d401d89e8f6cfb724eb00ac073c74edc93a4db9b0889c999f2415fe"},"eventId":622,"tag":"EndEvent"},"tag":"Node"}}
{"timestamp":"2023-08-29T19:38:43.769747589Z","threadId":67,"namespace":"HydraNode-\"Dan\"","message":{"directChain":{"contents":{"tag":"BeginInitialize"},"tag":"Wallet"},"tag":"DirectChain"}}
{"timestamp":"2023-08-29T19:38:44.462112583Z","threadId":101,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Application Exception: 13.39.121.84:5001 SubscriberError {seType = SubscriberWorkerCancelled, seMessage = \"SubscriptionWorker exiting\", seStack = []}","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
{"timestamp":"2023-08-29T19:38:44.462252522Z","threadId":101,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Closed socket to 13.39.121.84:5001","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
{"timestamp":"2023-08-29T19:38:44.462682122Z","threadId":102,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Application Exception: 13.37.150.125:5001 SubscriberError {seType = SubscriberWorkerCancelled, seMessage = \"SubscriptionWorker exiting\", seStack = []}","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
{"timestamp":"2023-08-29T19:38:44.462786868Z","threadId":102,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Closed socket to 13.37.150.125:5001","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
{"timestamp":"2023-08-29T19:38:44.463957252Z","threadId":107,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Application Exception: 13.37.15.211:5001 SubscriberError {seType = SubscriberWorkerCancelled, seMessage = \"SubscriptionWorker exiting\", seStack = []}","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
{"timestamp":"2023-08-29T19:38:44.464034863Z","threadId":107,"namespace":"HydraNode-\"Dan\"","message":{"network":{"data":{"subscriptions":{"dsts":"[13.39.121.84:5001,13.37.150.125:5001,13.37.15.211:5001]","event":"Closed socket to 13.37.15.211:5001","src":"LocalAddresses {laIpv4 = Just 0.0.0.0:0, laIpv6 = Nothing, laUnix = Nothing}","tag":"WithIPList"},"tag":"TraceSubscriptions"},"host":{"hostname":"0.0.0.0","port":5005}},"tag":"Network"}}
hydra-node: QueryAcquireException AFPointNotOnChain
running on 9754dcc304eba53a6804acd394ced29efcd88546