dbx
dbx copied to clipboard
dbx sync does not work behind proxy
Expected Behavior
dbx sync repo should use the proxy specified by the environment variables http_proxy/https_proxy.
Current Behavior
Using dbx sync repo -d yyyyy behind a proxy with the https_proxy environment variable set causes the following error:
[dbx][2022-06-10 11:00:06.297] Syncing from C:\Users\xxxxx\projects\yyyyy
[dbx][2022-06-10 11:00:06.328] Target base path: /Repos/xxxxx/yyyyy
[dbx][2022-06-10 11:00:06.328] Ignoring patterns from C:\Users\xxxxx\projects\yyyyy\.gitignore
[dbx][2022-06-10 11:00:06.328] Starting initial copy
[dbx][2022-06-10 11:00:13.331] Checking if any unmatched files/directories would be deleted
[dbx][2022-06-10 11:00:13.346] Creating /Repos/xxxxx/yyyyy/conf
[dbx][2022-06-10 11:00:13.384] Creating /Repos/xxxxx/yyyyy/zzzzzz
[dbx][2022-06-10 11:00:13.384] Creating /Repos/xxxxx/yyyyy/notebooks
[dbx][2022-06-10 11:00:13.384] Creating /Repos/xxxxx/yyyyy/tests
[dbx][2022-06-10 11:00:13.400] Creating /Repos/xxxxx/yyyyy/venv
[dbx][2022-06-09 11:00:13.423] Creating /Repos/xxxxx/yyyyy/temp
Traceback (most recent call last):
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 986, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs) # type: ignore[return-value] # noqa
File "C:\Users\xxxxx\Python\Python38\lib\asyncio\base_events.py", line 1025, in create_connection
raise exceptions[0]
File "C:\Users\xxxxx\Python\Python38\lib\asyncio\base_events.py", line 1010, in create_connection
sock = await self._connect_sock(
File "C:\Users\xxxxx\Python\Python38\lib\asyncio\base_events.py", line 924, in _connect_sock
await self.sock_connect(sock, address)
File "C:\Users\xxxxx\Python\Python38\lib\asyncio\selector_events.py", line 496, in sock_connect
return await fut
File "C:\Users\xxxxx\Python\Python38\lib\asyncio\selector_events.py", line 528, in _sock_connect_cb
raise OSError(err, f'Connect call failed {address}')
TimeoutError: [Errno 10060] Connect call failed ('123.345.456.567', 443)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\xxxxx\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\xxxxx\Python\Python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\xxxxx\projects\yyyyy\venv\Scripts\dbx.exe\__main__.py", line 7, in <module>
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\commands\sync.py", line 581, in repo
main_loop(
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\commands\sync.py", line 167, in main_loop
op_count = syncer.incremental_copy()
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\__init__.py", line 454, in incremental_copy
op_count = asyncio.run(self._apply_snapshot_diff(diff))
File "C:\Users\xxxxx\Python\Python38\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "C:\Users\xxxxx\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\__init__.py", line 245, in _apply_snapshot_diff
op_count += await self._apply_dirs_created(diff, session)
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\__init__.py", line 163, in _apply_dirs_created
await asyncio.gather(*tasks)
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\clients.py", line 243, in mkdirs
await self._api_mkdirs(
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\clients.py", line 135, in _api_mkdirs
await self._api(url=f"{api_base_path}/mkdirs", path=path, session=session, ssl=ssl, api_token=api_token)
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\clients.py", line 98, in _api
async with session.post(url=url, json=json_data, headers=headers, **more_opts) as resp:
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\client.py", line 1138, in __aenter__
self._resp = await self._coro
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\client.py", line 535, in _request
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 542, in connect
proto = await self._create_connection(req, traces, timeout)
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 907, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 1206, in _create_direct_connection
raise last_exc
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 1175, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 992, in _wrap_create_connection
raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host adb-1234567890123456789.1.azuredatabricks.net:443 ssl:default [Connect call failed ('123.345.456.567', 443)]
Context
Other commands like dbx execute work fine.
Your Environment
- dbx version used: 0.5.0
- Databricks Runtime version: 10.5
@matthayes could you please take a look into this? I'm not sure how to configure proxies properly for async http client.
From https://docs.aiohttp.org/en/stable/client_advanced.html?highlight=proxy#proxy-support:
Contrary to the requests library, it won’t read environment variables by default. But you can do so by passing trust_env=True into aiohttp.ClientSession constructor for extracting proxy configuration from HTTP_PROXY, HTTPS_PROXY, WS_PROXY or WSS_PROXY environment variables (all are case insensitive):
async with aiohttp.ClientSession(trust_env=True) as session:
async with session.get("http://python.org") as resp:
print(resp.status)
If nothing speaks against it, I think we would only need to set trust_env to True for every ClientSession?
This sounds reasonable to me.
hi @pspeter , thanks a lot for raising the issue. I've prepared the fix and it will be released as a part of 0.7.0