awsome-distributed-training icon indicating copy to clipboard operation
awsome-distributed-training copied to clipboard

FSDP Example ReadTimeoutError

Open nghtm opened this issue 4 months ago • 1 comments

 7: [rank80]: urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)

Running FSDP example, 16 p5 nodes. The example worked with 8 nodes

nghtm avatar Oct 02 '24 23:10 nghtm