cardano-node icon indicating copy to clipboard operation
cardano-node copied to clipboard

[BUG] - cardano-testnet sometimes hangs indefinitely

Open carbolymer opened this issue 10 months ago • 5 comments

Internal/External Internal if an IOHK staff member.

Area

Other Any other topic (Delegation, Ranking, ...).

Summary Sometimes a cardano-testnet test suite hangs indefinitely. It's like nodes are taking longer time to produce blocks. It may be related to what @james-iohk described here https://github.com/IntersectMBO/cardano-node/pull/5679#issuecomment-1959419678

The issue is more visible in slower machines, like macos runner in GHA or darwin cross-compilation in Hydra.

Steps to reproduce Steps to reproduce the behavior:

  1. Rerun the cardano-test suite multiple times, some of the tests should either get stuck or fail on a condition check in byDeadlineM.

The issue appears to appear more frequently when running testnet test suites in parallel.

[!NOTE] Testnet tests can be executed in parallel using PARALLEL_TESTNETS=1 environment variable or by setting --test-options '--num-threads 8' in cabal test cardano-testnet execution (after that PR gets merged).

Sample log of a failure: babbagetransaction.txt (taken from: https://github.com/IntersectMBO/cardano-node/pull/5695/checks?check_run_id=22357754517)

Expected behavior cardano-testnet does not hang, or retries, reports the failure with message explaining what happened.

carbolymer avatar Apr 08 '24 18:04 carbolymer

Initially, byDeadlineM usage was considered an issue here, which was partially removed in https://github.com/IntersectMBO/cardano-node/pull/5707 - but instead of test failures we started getting cardano-tesnet freezes. A suspicion here is that the test network is not advancing - the new blocks are not produced.

carbolymer avatar Apr 08 '24 18:04 carbolymer

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

github-actions[bot] avatar May 11 '24 01:05 github-actions[bot]

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

github-actions[bot] avatar Jun 16 '24 01:06 github-actions[bot]

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

github-actions[bot] avatar Jul 20 '24 01:07 github-actions[bot]

Some stability window discussion (internal link): https://docs.google.com/document/d/1B8BNMx8jVWRjYiUBOaI3jfZ7dQNvNTSDODvT5iOuYCU/edit#heading=h.qh2zcajmu6hm

Consensus docs: https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/Glossary#epoch-structure

carbolymer avatar Sep 16 '24 19:09 carbolymer