spilo icon indicating copy to clipboard operation
spilo copied to clipboard

spilo failing if socket error occurs on 169.254.169.254 on Docker Desktop (WSL)

Open lc-guy opened this issue 2 years ago • 0 comments

https://github.com/zalando/spilo/blob/24a62c5814887c84bafe87dc2cf1fb19ff264172/postgres-appliance/scripts/configure_spilo.py#L388

the except block on get_provider() catches ConnectionErrors on the request to 169.254.169.254 (which is part of an unrouteable block in windows, as an APIPA block). Subsequently, Windows throws up a socket error on the requested URL, which manifests as a 403 forbidden reply from inside the spilo container:

root@<spilo_container_id>:/home/postgres# curl -v http://169.254.169.254/
*   Trying 169.254.169.254:80...
* Connected to 169.254.169.254 (169.254.169.254) port 80 (#0)
> GET / HTTP/1.1
> Host: 169.254.169.254
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 403 connecting to 169.254.169.254:80: connecting to 169.254.169.254:80: dial tcp 169.254.169.254:80: connectex: A socket operation was attempted to an unreachable network.
< Connection: close
<
* Closing connection 0

This leads to silent cascading failures in the configuration script (due to PROVIDER_UNSUPPORTED), eventually leading to this obscure error message and crash when attempting to start a spilo image:

Traceback (most recent call last):
2023-02-16T16:37:37.455847500Z   File "/usr/local/bin/patroni", line 33, in <module>
2023-02-16T16:37:37.455857700Z     sys.exit(load_entry_point('patroni==1.6.5', 'console_scripts', 'patroni')())
2023-02-16T16:37:37.455865000Z   File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 235, in main
2023-02-16T16:37:37.455871100Z     return patroni_main()
2023-02-16T16:37:37.455896800Z   File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 197, in patroni_main
2023-02-16T16:37:37.455903800Z     patroni = Patroni(conf)
2023-02-16T16:37:37.455909600Z   File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 32, in __init__
2023-02-16T16:37:37.455915600Z     self.dcs = get_dcs(self.config)
2023-02-16T16:37:37.455921100Z   File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/__init__.py", line 106, in get_dcs
2023-02-16T16:37:37.455927200Z     Available implementations: """ + ', '.join(sorted(set(available_implementations))))
2023-02-16T16:37:37.455933600Z patroni.exceptions.PatroniException: 'Can not find suitable configuration of distributed configuration store\nAvailable implementations: consul, etcd, exhibitor, kubernetes, zookeeper'
2023-02-16T16:37:37.557672800Z /run/service/patroni: finished with code=1 signal=0
2023-02-16T16:37:37.558436700Z /run/service/patroni: sleeping 30 seconds

This issue breaks spilo on all recent versions of Docker Desktop for Windows (at least on our corporate machines). A temporary fix consists in setting SPILO_PROVIDER=local in the docker-compose environment variables. It would be best not to assume that a 403 response indicates necessarily being on a cloud provider.

This issue shows a similar enough message, suggesting the "issue" could come from the translation layer of VPNKit between the VM and the host giving an actual HTTP reply to such a critically failing request.

lc-guy avatar Mar 09 '23 10:03 lc-guy