KG_RAG
KG_RAG copied to clipboard
OSError: [Errno 101] Network is unreachable
When I execute the sft (used in large language model), the error occur. I don't know how to solve it. I will be glad to see your help!
[2025-03-29 11:31:28,625] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2025-03-29 11:31:28,625] [INFO] [comm.py:616:init_distributed] cdb=None [2025-03-29 11:31:28,625] [INFO] [comm.py:643:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [rank0]: Traceback (most recent call last): [rank0]: File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 169, in _new_conn [rank0]: conn = connection.create_connection( [rank0]: File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 96, in create_connection [rank0]: raise err [rank0]: File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 86, in create_connection [rank0]: sock.connect(sa) [rank0]: OSError: [Errno 101] Network is unreachable
[rank0]: During handling of the above exception, another exception occurred:
[rank0]: Traceback (most recent call last): [rank0]: File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 700, in urlopen [rank0]: httplib_response = self._make_request( [rank0]: File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 383, in _make_request [rank0]: self._validate_conn(conn) [rank0]: File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1017, in _validate_conn [rank0]: conn.connect() [rank0]: File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 353, in connect [rank0]: conn = self._new_conn() [rank0]: File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 181, in _new_conn [rank0]: raise NewConnectionError( [rank0]: urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fdc363a0c70>: Failed to establish a new connection: [Errno 101] Network is unreachable
[rank0]: During handling of the above exception, another exception occurred:
[rank0]: Traceback (most recent call last): [rank0]: File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 667, in send [rank0]: resp = conn.urlopen( [rank0]: File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 756, in urlopen [rank0]: retries = retries.increment( [rank0]: File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 576, in increment [rank0]: raise MaxRetryError(_pool, url, error or ResponseError(cause)) [rank0]: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Qwen1.5-14B-Chat/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fdc363a0c70>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
[rank0]: During handling of the above exception, another exception occurred:
[rank0]: Traceback (most recent call last): [rank0]: File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1376, in _get_metadata_or_catch_error [rank0]: metadata = get_hf_file_metadata( [rank0]: File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn [rank0]: return fn(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1296, in get_hf_file_metadata [rank0]: r = _request_wrapper( [rank0]: File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 280, in _request_wrapper [rank0]: response = _request_wrapper( [rank0]: File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 303, in _request_wrapper [rank0]: response = get_session().request(method=method, url=url, **params) [rank0]: File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 589, in request [rank0]: resp = self.send(prep, **send_kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 703, in send [rank0]: r = adapter.send(request, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_http.py", line 96, in send [rank0]: return super().send(request, *args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 700, in send [rank0]: raise ConnectionError(e, request=request) [rank0]: requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Qwen1.5-14B-Chat/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fdc363a0c70>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 34b892b5-cb11-49f0-9b0e-6d5eedc96f0f)')
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last): [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 402, in cached_file [rank0]: resolved_file = hf_hub_download( [rank0]: File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn [rank0]: return fn(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download [rank0]: return _hf_hub_download_to_cache_dir( [rank0]: File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 969, in _hf_hub_download_to_cache_dir [rank0]: _raise_on_head_call_error(head_call_error, force_download, local_files_only) [rank0]: File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1489, in _raise_on_head_call_error [rank0]: raise LocalEntryNotFoundError( [rank0]: huggingface_hub.errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/LLMOPT/LLMOPT/./sft/sft.py", line 419, in