Random fails at accepting hostkeys step
Hello, I've been trying to run the benchmark on a 16 SBC nodes cluster and it's almost successful. I've experienced random fails like this one:
Failed: [worker16] (item=10.42.0.101) => changed=true anfsible_loop_var: item cmd:
- ssh
- [email protected]
- -o
- StrictHostKeyChecking=accept-new
- date delta: '0:00:00.029161' end: '2025-03-17 13:04:07.606093' item: 10.42.0.101 msg: non-zero return code rc: 255 start: '2025-03-17 13:04:07.576932' stderr: |- kex_exchange_identification: read: Connection reset by peer Connection reset by 10.42.0.101 port 22 stderr_lines:
stdout: '' stdout_lines:
The nodes are all connected in gigabyte Ethernet, and I haven't encountered connection issues otherwise. The fails happen at different nodes if I run the bench several times.
Any leads on the possible causes ? Any specific data or logs that could be helpful ?
So this is during the initial connection? Or is it happening during a specific task or when you hit the 'Run the benchmark" task?
It happens during the "Accept hostkeys for each host on each host." If I ssh between the mentioned nodes, i have this kind of message :
The authenticity of host 'worker01 (10.42.0.101)' can't be established. ED25519 key fingerprint is SHA256:hash. This host key is known by the following other names/addresses: ~/.ssh/known_hosts:1: [hashed name] ~/.ssh/known_hosts:13: [hashed name] ~/.ssh/known_hosts:14: [hashed name] ~/.ssh/known_hosts:15: [hashed name] ~/.ssh/known_hosts:16: [hashed name] Are you sure you want to continue connecting (yes/no/[fingerprint])?
I'm unsure how it impacts the benchmark.
PS: I tried clearing the known_hosts file to be sure, and it had no effect.
@jere19 Ah... sometimes on first connection, Ansible gets picky about that — I usually sit there and enter "yes" and hit "return" a bunch of times if I see that message, and eventually it accepts the hostkeys and proceeds.
This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!
Please read this blog post to see the reasons why I mark issues as stale.
This issue has been closed due to inactivity. If you feel this is in error, please reopen the issue or file a new issue with the relevant details.