server
server copied to clipboard
wandb: Network error (ConnectionError), entering retry loop.
im using NVIDIA A100 40GB GPU to train my object detection model , im using ubantu 20.04 server machine & https://github.com/WongKinYiu/yolov7 this repo / but not able to do trainig of model because of wamdb issue
im using batch scripting to launch my job
error log
YOLOR 🚀 v0.1-43-g8b72ac7 torch 1.9.0+cu111 CUDA:0 (A100-SXM4-40GB, 40537.1875MB)
Retry attempt failed: Traceback (most recent call last): File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/util/connection.py", line 72, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/socket.py", line 918, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 386, in _make_request self._validate_conn(conn) File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn conn.connect() File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connection.py", line 358, in connect self.sock = conn = self._new_conn() File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f93b64ab520>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/adapters.py", line 489, in send resp = conn.urlopen( File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen retries = retries.increment( File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f93b64ab520>: Failed to establish a new connection: [Errno -2] Name or service not known'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 108, in call
result = self._call_fn(*args, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 158, in execute
return self.client.execute(*args, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute
result = self._get_result(document, *args, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result
return self.transport.execute(document, *args, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/transport/requests.py", line 38, in execute
request = requests.post(self.url, **post_args)
File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/adapters.py", line 565, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f93b64ab520>: Failed to establish a new connection: [Errno -2] Name or service not known'))
wandb: Network error (ConnectionError), entering retry loop.
wandb: W&B API key is configured. Use wandb login --relogin
to force relogin
wandb: Network error (ConnectionError), entering retry loop.
https://github.com/wandb/local/issues/34 here is same issue but not able to solve
solved the issue by adding it inside my .sh file
Kindly use the below proxy ip addrss for wandb connection.
export http_proxy=http://dgx-proxy-mn.mgmt.siddhi.param:9090/
export ftp_proxy=http://dgx-proxy-mn.mgmt.siddhi.param:9090/
export https_proxy=http://dgx-proxy-mn.mgmt.siddhi.param:9090/
WandB Internal User commented: akashAD98 commented: https://github.com/wandb/local/issues/34 here is same issue but not able to solve
hello, i encountered the similar problem(pls see picture i attached below), and i tried your suggested method. However, it seems not work for me. how can i fix this problem?
my working setting : local OS is win10 education version, remote server OS Ubuntu 20.04.2 LTS.