server icon indicating copy to clipboard operation
server copied to clipboard

wandb: Network error (ConnectionError), entering retry loop.

Open akashAD98 opened this issue 2 years ago • 4 comments

im using NVIDIA A100 40GB GPU to train my object detection model , im using ubantu 20.04 server machine & https://github.com/WongKinYiu/yolov7 this repo / but not able to do trainig of model because of wamdb issue

im using batch scripting to launch my job

error log

YOLOR 🚀 v0.1-43-g8b72ac7 torch 1.9.0+cu111 CUDA:0 (A100-SXM4-40GB, 40537.1875MB)

Retry attempt failed: Traceback (most recent call last): File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/util/connection.py", line 72, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/socket.py", line 918, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 386, in _make_request self._validate_conn(conn) File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn conn.connect() File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connection.py", line 358, in connect self.sock = conn = self._new_conn() File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f93b64ab520>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/adapters.py", line 489, in send resp = conn.urlopen( File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen retries = retries.increment( File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f93b64ab520>: Failed to establish a new connection: [Errno -2] Name or service not known'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 108, in call result = self._call_fn(*args, **kwargs) File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 158, in execute return self.client.execute(*args, **kwargs) File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute result = self._get_result(document, *args, **kwargs) File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result return self.transport.execute(document, *args, **kwargs) File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/transport/requests.py", line 38, in execute request = requests.post(self.url, **post_args) File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/api.py", line 115, in post return request("post", url, data=data, json=json, **kwargs) File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/adapters.py", line 565, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f93b64ab520>: Failed to establish a new connection: [Errno -2] Name or service not known')) wandb: Network error (ConnectionError), entering retry loop. wandb: W&B API key is configured. Use wandb login --relogin to force relogin wandb: Network error (ConnectionError), entering retry loop.

akashAD98 avatar Jul 22 '22 07:07 akashAD98

https://github.com/wandb/local/issues/34 here is same issue but not able to solve

akashAD98 avatar Jul 22 '22 07:07 akashAD98

solved the issue by adding it inside my .sh file

Kindly use the below proxy ip addrss for wandb connection.

export http_proxy=http://dgx-proxy-mn.mgmt.siddhi.param:9090/

export ftp_proxy=http://dgx-proxy-mn.mgmt.siddhi.param:9090/

export https_proxy=http://dgx-proxy-mn.mgmt.siddhi.param:9090/

akashAD98 avatar Jul 22 '22 16:07 akashAD98

WandB Internal User commented: akashAD98 commented: https://github.com/wandb/local/issues/34 here is same issue but not able to solve

exalate-issue-sync[bot] avatar Jul 22 '22 21:07 exalate-issue-sync[bot]

hello, i encountered the similar problem(pls see picture i attached below), and i tried your suggested method. However, it seems not work for me. how can i fix this problem? my working setting : local OS is win10 education version, remote server OS Ubuntu 20.04.2 LTS. image

Vita112 avatar Jun 21 '23 07:06 Vita112