cardano-node icon indicating copy to clipboard operation
cardano-node copied to clipboard

[Windows] [vasil-testnet-v1] the node stops responding to `query tip` and the socket is not found anymore

Open dorin100 opened this issue 3 years ago • 12 comments

Internal/External Internal if an IOHK staff member.

Area node syncing on windows

Summary When using tag vasil-testnet-v1, only on Windows, in 3 out of 4 cases, the node stops responding to query tip and the socket is not found anymore.

Steps to reproduce Steps to reproduce the behavior:

  1. Start node on windows using tag vasil-testnet-v1
  2. query the tip of the node each 10s until the node is fully synced (syncProgress = 100)
  • https://github.com/input-output-hk/cardano-node-tests/runs/6544279877?check_suite_focus=true
- waiting for the node to sync
actual_era  : byron  - actual_epoch: 0  - actual_block: 309  - actual_slot : 335  - syncProgress: 0
actual_era  : shelley  - actual_epoch: 282  - actual_block: 101157  - actual_slot : 2022[49](https://github.com/input-output-hk/cardano-node-tests/runs/6544279877?check_suite_focus=true#step:5:50)9  - syncProgress: 3
actual_era  : shelley  - actual_epoch: 583  - actual_block: 206081  - actual_slot : 418[50](https://github.com/input-output-hk/cardano-node-tests/runs/6544279877?check_suite_focus=true#step:5:51)25  - syncProgress: 7
actual_era  : shelley  - actual_epoch: 897  - actual_block: 316433  - actual_slot : 6447080  - syncProgress: 11
actual_era  : shelley  - actual_epoch: 1225  - actual_block: 423988  - actual_slot : 8806800  - syncProgress: 15
actual_era  : allegra  - actual_epoch: 1539  - actual_block: 532758  - actual_slot : 11071320  - syncProgress: 19
actual_era  : allegra  - actual_epoch: 1907  - actual_block: 639587  - actual_slot : 13720235  - syncProgress: 24
actual_era  : mary  - actual_epoch: 2206  - actual_block: 746268  - actual_slot : 15871297  - syncProgress: 28
actual_era  : mary  - actual_epoch: 2556  - actual_block: 8[52](https://github.com/input-output-hk/cardano-node-tests/runs/6544279877?check_suite_focus=true#step:5:53)273  - actual_slot : 18391115  - syncProgress: 33
actual_era  : mary  - actual_epoch: 2858  - actual_block: 957501  - actual_slot : 20564587  - syncProgress: 37
actual_era  : mary  - actual_epoch: 3155  - actual_block: 10632[53](https://github.com/input-output-hk/cardano-node-tests/runs/6544279877?check_suite_focus=true#step:5:54)  - actual_slot : 22707147  - syncProgress: 40
 === Waiting [60](https://github.com/input-output-hk/cardano-node-tests/runs/6544279877?check_suite_focus=true#step:5:61)s before retrying to get the tip again - 0
     !!!ERROR: command cardano-cli.exe query tip --testnet-magic 3 return with error (code 1): b'MuxError (MuxIOException withIODataPtr (readHandle): resource vanished (The pipe has been ended.)) "(readHandle errored)"\r\n\r\ncardano-cli.exe: MuxIOException withIODataPtr (readHandle): resource vanished (The pipe has been ended.) "(readHandle errored)"'
 === Waiting 60s before retrying to get the tip again - 1
     !!!ERROR: command cardano-cli.exe query tip --testnet-magic 3 return with error (code 1): b'\r\ncardano-cli.exe: CreateFile "\\\\\\\\.\\\\pipe\\\\cardano-node": does not exist (The system cannot find the file specified.)'
 === Waiting 60s before retrying to get the tip again - 2
     !!!ERROR: command cardano-cli.exe query tip --testnet-magic 3 return with error (code 1): b'\r\ncardano-cli.exe: CreateFile "\\\\\\\\.\\\\pipe\\\\cardano-node": does not exist (The system cannot find the file specified.)'

Expected behavior The node should successfully sync and query tip should not return errors.

System info (please complete the following information):

  • OS Name: Windows
  • Node version vasil-testnet-v1
  • CLI version vasil-testnet-v1

dorin100 avatar May 25 '22 07:05 dorin100

Hi, I can see this is assigned to @coot in the JIRA ticket, can someone please assign to him on GH too - I dont have the permissions

tbrahmbhatt avatar May 27 '22 09:05 tbrahmbhatt

@dorin100 does the node stops? Could make sure the following traces are turn on:

  • TraceLocalHandshake
  • TraceErrorPolicy
  • TraceLocalErrorPolicy

if they are where can I find a the log file?

coot avatar May 30 '22 15:05 coot

@coot the node does not seem to stop.

The GH jobs that ran these tests can be found here (both are using node tag vasil-testnet-v1):

  • first run - 2 out of 2 windows jobs failed - https://github.com/input-output-hk/cardano-node-tests/actions/runs/2366734246
  • second run - 1 out of 2 windows jobs failed - https://github.com/input-output-hk/cardano-node-tests/actions/runs/2367102172

The logs for each job can be found at the above links in the Artifacts section.

All the jobs are run as default with:

TraceLocalHandshake = false
TraceErrorPolicy = true
TraceLocalErrorPolicy = true

Here all windows jobs are ok but we are using tag 1.34.1 (last released one) - https://github.com/input-output-hk/cardano-node-tests/actions/runs/2367105131

FYI: the below flags were not set for any of those sync tests (but only the windows ones are failing)

"TestEnableDevelopmentNetworkProtocols": true,
"TestEnableDevelopmentHardForkEras": true,

dorin100 avatar Jun 01 '22 06:06 dorin100

~@dorin100 when I look at the recent logs node_logs_shelley_qa_windows-latest from this run. I can only see failing Plutus scripts~

I looked at the wrong logs.

coot avatar Jun 02 '22 10:06 coot

@dorin100 my Windows node synced, and I wasn't able to reproduce the issue: every 10s a cardano-cli was able to query the tip of the node. Could you please enable the following tracers and configure them with the given minimal severities:

Tracer Severity
TraceLocalMux Info
TraceLocalHandshake Info
TraceLocalErrorPolicy Debug
TraceErrorPolicy Debug

coot avatar Jun 13 '22 14:06 coot

@coot - the Window job failed again but this time I used the above tracers. The results can be found here - https://github.com/input-output-hk/cardano-node-tests/actions/runs/2495501357 (look for the [node_logs_staging_windows-latest](https://github.com/input-output-hk/cardano-node-tests/suites/6926637872/artifacts/269779719))

dorin100 avatar Jun 15 '22 09:06 dorin100

this was reproducible in 1 out of 2 runs also with 1.35.0-rc3 --> job: https://github.com/input-output-hk/cardano-node-tests/actions/runs/2501998632

dorin100 avatar Jun 15 '22 17:06 dorin100

just for completeness sake: by reading the logs I didn't found anything wrong with the node. A few seconds before cardano-node shutdown the it accepted the last local connection, while cardano-cli was failing to connect very 60s. We don't have time stamps when cardano-cli was trying to create these connections, but it seems as it's trying to do that after cardano-node terminated.

coot avatar Jun 20 '22 05:06 coot

We discovered:

DiffusionErrored (ExceptionInLinkedThread "ThreadId 34" <stderr>: hPutChar: resource vanished (Broken pipe)

which indicates that there's something wrong with the logging system.

coot avatar Jun 22 '22 08:06 coot

The issue is not reproducing when we run the node with "TurnOnLogging": false

PS: the above error might indicate that the io error was propagated to the top level and the node shutdown (and it should not shut down).

The above error can be found in the logs here:

dorin100 avatar Jun 23 '22 06:06 dorin100

So, this is legacy tracing system.

We can fix this, by migrating Daedalus to the new tracing system -- which I believe is ready to accomodate Daedalus.

deepfire avatar Sep 14 '22 12:09 deepfire

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

github-actions[bot] avatar Oct 13 '25 02:10 github-actions[bot]