exo copied to clipboard
Cloud Platform Networking Support - Peer Discovery
Current networking for peer discovery is based on UDP broadcasting, which is not commonly supported on cloud platforms, thus peer nodes are not able to find each other even though they are located within the same virtual network without any firewall rules enforced.
Setting up two VM instances on GCP with Debian image. This is what appears in the log with DEBUG_DISCOVERY=9 DEBUG=9 python3 main.py
Detected system: Linux
Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: HFShardDownloader
Trying to find available port port=60439
[55750, 53702, 58236, 60403, 56807, 55655, 56261, 63656, 64548, 52168, 53646, 49910, 61021, 63850, 65285, 60222, 56650, 56276, 57157]
Using available port: 60439
Retrieved existing node ID: fff5b2ef-1d4d-4170-93ab-8f748d777492
Chat interface started:
ChatGPT API endpoint served at:
tinygrad Device.DEFAULT='CLANG'
Server started, listening on
tinygrad Device.DEFAULT='CLANG'
Starting peer discovery process...
Current number of known peers: 0. Waiting 5 seconds to discover more...
No new peers discovered in the last grace period. Ending discovery process.
Collecting topology max_depth=4 visited=set()
Collected topology: Topology(Nodes: {fff5b2ef-1d4d-4170-93ab-8f748d777492: Model: Linux Box (Device: CLANG).
Chip: Unknown Chip (Device: CLANG). Memory: 15990MB. Flops: fp32: 0.00 TFLOPS, fp16: 0.00 TFLOPS, int8: 0.00
TFLOPS}, Edges: {})
Peer statuses: {}
Broadcast presence: b'{"type": "discovery", "node_id": "fff5b2ef-1d4d-4170-93ab-8f748d777492", "grpc_port":
60439, "device_capabilities": {"model": "Linux Box (Device: CLANG)", "chip": "Unknown Chip (Device: CLANG)",
"memory": 15990, "flops": {"fp32": 0, "fp16": 0, "int8": 0}}}'
Peer statuses: {}
Broadcast presence: b'{"type": "discovery", "node_id": "fff5b2ef-1d4d-4170-93ab-8f748d777492", "grpc_port":
60439, "device_capabilities": {"model": "Linux Box (Device: CLANG)", "chip": "Unknown Chip (Device: CLANG)",