dbx icon indicating copy to clipboard operation
dbx copied to clipboard

dbx sync does not work behind proxy

Open pspeter opened this issue 3 years ago • 3 comments

Expected Behavior

dbx sync repo should use the proxy specified by the environment variables http_proxy/https_proxy.

Current Behavior

Using dbx sync repo -d yyyyy behind a proxy with the https_proxy environment variable set causes the following error:

[dbx][2022-06-10 11:00:06.297] Syncing from C:\Users\xxxxx\projects\yyyyy
[dbx][2022-06-10 11:00:06.328] Target base path: /Repos/xxxxx/yyyyy
[dbx][2022-06-10 11:00:06.328] Ignoring patterns from C:\Users\xxxxx\projects\yyyyy\.gitignore
[dbx][2022-06-10 11:00:06.328] Starting initial copy
[dbx][2022-06-10 11:00:13.331] Checking if any unmatched files/directories would be deleted
[dbx][2022-06-10 11:00:13.346] Creating /Repos/xxxxx/yyyyy/conf
[dbx][2022-06-10 11:00:13.384] Creating /Repos/xxxxx/yyyyy/zzzzzz
[dbx][2022-06-10 11:00:13.384] Creating /Repos/xxxxx/yyyyy/notebooks
[dbx][2022-06-10 11:00:13.384] Creating /Repos/xxxxx/yyyyy/tests    
[dbx][2022-06-10 11:00:13.400] Creating /Repos/xxxxx/yyyyy/venv 
[dbx][2022-06-09 11:00:13.423] Creating /Repos/xxxxx/yyyyy/temp
Traceback (most recent call last):
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 986, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  File "C:\Users\xxxxx\Python\Python38\lib\asyncio\base_events.py", line 1025, in create_connection
    raise exceptions[0]
  File "C:\Users\xxxxx\Python\Python38\lib\asyncio\base_events.py", line 1010, in create_connection
    sock = await self._connect_sock(
  File "C:\Users\xxxxx\Python\Python38\lib\asyncio\base_events.py", line 924, in _connect_sock
    await self.sock_connect(sock, address)
  File "C:\Users\xxxxx\Python\Python38\lib\asyncio\selector_events.py", line 496, in sock_connect
    return await fut
  File "C:\Users\xxxxx\Python\Python38\lib\asyncio\selector_events.py", line 528, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
TimeoutError: [Errno 10060] Connect call failed ('123.345.456.567', 443)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\xxxxx\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\xxxxx\Python\Python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\xxxxx\projects\yyyyy\venv\Scripts\dbx.exe\__main__.py", line 7, in <module>
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\commands\sync.py", line 581, in repo
    main_loop(
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\commands\sync.py", line 167, in main_loop
    op_count = syncer.incremental_copy()
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\__init__.py", line 454, in incremental_copy
    op_count = asyncio.run(self._apply_snapshot_diff(diff))
  File "C:\Users\xxxxx\Python\Python38\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Users\xxxxx\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\__init__.py", line 245, in _apply_snapshot_diff
    op_count += await self._apply_dirs_created(diff, session)
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\__init__.py", line 163, in _apply_dirs_created
    await asyncio.gather(*tasks)
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\clients.py", line 243, in mkdirs
    await self._api_mkdirs(
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\clients.py", line 135, in _api_mkdirs
    await self._api(url=f"{api_base_path}/mkdirs", path=path, session=session, ssl=ssl, api_token=api_token)
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\dbx\sync\clients.py", line 98, in _api
    async with session.post(url=url, json=json_data, headers=headers, **more_opts) as resp:
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\client.py", line 535, in _request
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 542, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 907, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 1206, in _create_direct_connection
    raise last_exc
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 1175, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
  File "c:\users\xxxxx\projects\yyyyy\venv\lib\site-packages\aiohttp\connector.py", line 992, in _wrap_create_connection
    raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host adb-1234567890123456789.1.azuredatabricks.net:443 ssl:default [Connect call failed ('123.345.456.567', 443)]

Context

Other commands like dbx execute work fine.

Your Environment

  • dbx version used: 0.5.0
  • Databricks Runtime version: 10.5

pspeter avatar Jun 10 '22 11:06 pspeter

@matthayes could you please take a look into this? I'm not sure how to configure proxies properly for async http client.

renardeinside avatar Jun 21 '22 11:06 renardeinside

From https://docs.aiohttp.org/en/stable/client_advanced.html?highlight=proxy#proxy-support:

Contrary to the requests library, it won’t read environment variables by default. But you can do so by passing trust_env=True into aiohttp.ClientSession constructor for extracting proxy configuration from HTTP_PROXY, HTTPS_PROXY, WS_PROXY or WSS_PROXY environment variables (all are case insensitive):

async with aiohttp.ClientSession(trust_env=True) as session:
    async with session.get("http://python.org") as resp:
        print(resp.status)

If nothing speaks against it, I think we would only need to set trust_env to True for every ClientSession?

pspeter avatar Jun 22 '22 11:06 pspeter

This sounds reasonable to me.

matthayes avatar Jun 23 '22 00:06 matthayes

hi @pspeter , thanks a lot for raising the issue. I've prepared the fix and it will be released as a part of 0.7.0

renardeinside avatar Aug 16 '22 22:08 renardeinside