OLMo icon indicating copy to clipboard operation
OLMo copied to clipboard

ConnectionRefusedError: [Errno 111] Connection refused

Open DouPiChen opened this issue 1 year ago • 1 comments

❓ The question

When I run torchrun --nproc_per_node=4 scripts/train.py configs/official/OLMo-1B.yaml, I get the ConnectionRefusedError as follows 2024-02-07 095742 2024-02-07 095827

I have changed url in OLMo-1B.yaml as 2024-02-07 100159

DouPiChen avatar Feb 07 '24 02:02 DouPiChen

It's hard to tell what the error is from jus the call stack, which is almost all inside python's urllib3 error. Based on the last INFO line in the first image, the error appears to be in the Weights & Biases SDK. If you do not need Weights & Biases to keep track of run metrics, you can set --wandb to null. That is, run torchrun --nproc_per_node=4 scripts/train.py configs/official/OLMo-1B.yaml --wandb=null.

2015aroras avatar Feb 08 '24 17:02 2015aroras

I apologize for our delay in response. In order to help surface current, unresolved issues, we are closing tickets prior to February 29. Please reopen your ticket if you are continuing to experience this issue. Thank you!

dumitrac avatar Apr 30 '24 18:04 dumitrac