zos icon indicating copy to clipboard operation
zos copied to clipboard

Errors in ZOS for mycelium processes

Open delandtj opened this issue 2 months ago • 4 comments

Log entries like

		2025-10-22 16:07:54.447	
[-] mycelium-34uRzqqPcpt7a: error: a value is required for '--peers <STATIC_PEERS>...' but none was supplied
	2025-10-22 16:07:52.442	
[-] mycelium-34uRzqqPcpt7a: For more information, try '--help'.
	2025-10-22 16:07:52.442	
[-] mycelium-34uRzqqPcpt7a: 
	2025-10-22 16:07:52.442	
[-] mycelium-34uRzqqPcpt7a: error: a value is required for '--peers <STATIC_PEERS>...' but none was supplied
	2025-10-22 16:07:50.435	
[-] mycelium-34uRzqqPcpt7a: For more information, try '--help'.
	2025-10-22 16:07:50.435	
[-] mycelium-34uRzqqPcpt7a: 
	2025-10-22 16:07:50.435	
[-] mycelium-34uRzqqPcpt7a: error: a value is required for '--peers <STATIC_PEERS>...' but none was supplied
	2025-10-22 16:07:48.429	
[-] mycelium-34uRzqqPcpt7a: For more information, try '--help'.
	2025-10-22 16:07:48.429	
[-] mycelium-34uRzqqPcpt7a: 
	2025-10-22 16:07:48.429	
[-] mycelium-34uRzqqPcpt7a: error: a value is required for '--peers <STATIC_PEERS>...' but none was supplied

give the impression (eufemism) that there are no peers defined for an NR, so I quess that this is the origing of lots of errors we can see

I don't know where the code instantiates the mycelium but these errors are certainly something to look at

delandtj avatar Oct 22 '25 15:10 delandtj

for what I can tell it's when a node with an active NR and workload is restarted, then it happens

https://mon.grid.tf/explore?schemaVersion=1&panes=%7B%224ec%22:%7B%22datasource%22:%226hjPLZSHz%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnetwork%3D%5C%22production%5C%22,%20node%3D%5C%225HQYDnuUvLJ2MuSFogSzwoHmhNU9HTtwvDdRjfJQEfVXsHf7%5C%22%7D%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%226hjPLZSHz%22%7D,%22editorMode%22:%22builder%22,%22direction%22:%22backward%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D%7D&orgId=1

delandtj avatar Oct 22 '25 15:10 delandtj

  • noticed this debugging this issue https://git.ourworld.tf/tfgrid_internal/circle_tfgrid_ops/issues/538#issuecomment-22404 other NR on same node work fine and survived node restarts

  • each NR’s mycelium gets the host as its only peer, but in this case, it failed to get its IP even though networkd runs much earlier. but this is the only way this could happen

  • https://github.com/threefoldtech/zosbase/pull/81 provide a retry for a min to give breathing room for the node to get its network ready, and if it couldn't it will fail back to the public peers

Omarabdul3ziz avatar Oct 22 '25 19:10 Omarabdul3ziz

Hi @Omarabdul3ziz, how soon can we get the fix released?

scottyeager avatar Oct 28 '25 15:10 scottyeager

released today on devnet, for mainnet it will be part of the coming [email protected]

Omarabdul3ziz avatar Oct 29 '25 08:10 Omarabdul3ziz