FileNotFoundError when connecting to postgres if fd is closed and then reopened
I recently upgrade from python 3.6 to 3.8, and encountered a strange bug:
File ".../venv/lib/python3.8/site-packages/aiopg/connection.py", line 151, in _ready
self._loop.add_writer(self._fileno, self._ready, weak_self)
File "/usr/lib/python3.8/asyncio/selector_events.py", line 337, in add_writer
return self._add_writer(fd, callback, *args)
File "/usr/lib/python3.8/asyncio/selector_events.py", line 296, in _add_writer
self._selector.modify(fd, mask | selectors.EVENT_WRITE,
File "/usr/lib/python3.8/selectors.py", line 389, in modify
self._selector.modify(key.fd, selector_events)
FileNotFoundError: [Errno 2] No such file or directory
The crash occurred when connecting to a database with await aiopg.sa.create_engine(...). I was only able to reproduce it under certain circumstances: for example if I disabled SSL it would not happen, and if I used a .pgpass file rather than passing a password to create_engine() it would not happen.
Here's what the strace looked like at the time of the crash:
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 10
setsockopt(10, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl(10, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl(10, F_SETFD, FD_CLOEXEC) = 0
setsockopt(10, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
connect(10, {sa_family=AF_INET, sin_port=htons(5432), sin_addr=inet_addr("X.X.X.X")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(6, EPOLL_CTL_MOD, 10, {EPOLLIN|EPOLLOUT, {u32=10, u64=10}}) = -1 ENOENT (No such file or directory)
I discovered that what was happening was that under certain conditions, libpq will close and then reopen the socket, such that the fd underlying the aiopg connection is a new socket but has the same fd number. Turns out that in the documentation they have a disclaimer that they are allowed to do this:
Use PQsocket(conn) to obtain the descriptor of the socket underlying the database connection. (Caution: do not assume that the socket remains the same across PQconnectPoll calls.)
In python 3.6, the implementation of _PollLikeSelector.modify was to call unregister() and then register(). In python 3.7 they added a patch which changed the implementation: now it uses epoll.modify(). Whereas before, if we had replaced the socket with a new one with the same fd number, the unregister/register would still work, but now that it's a different socket, the modify causes us to do an EPOLL_CTL_MOD before EPOLL_CTL_ADD, returning ENOENT.
The bottom line is that libpq thinks that it's ok to replace the socket silently, and python doesn't. It seems that the best place to resolve this contradiction might be in aiopg. A possible workaround might be to detect that the socket has been replaced, and to remove the fd from the event loop and re-add it.
I have a similar situation. Found on Arch Linux with PostgreSQL stopped (not installed).
# -*- coding: utf-8 -*-
__requires__ = ['aiopg[sa]==1.2.1', 'cffi==1.14.5']
import sys, os
import asyncio
import aiopg.sa
import aiopg.connection
# PostgreSQL server does not running!
DATABASE_URL = 'postgresql://localhost:12345/fake'
print('* uname:', ' '.join(os.uname()))
print('* python:', sys.version.replace('\n', ' ').replace(' ', ' ').strip())
print('* asiopg:', aiopg.version.replace('\n', ' ').replace(' ', ' ').strip())
print('* psycopg2:', aiopg.connection.psycopg2.__version__)
try:
import cffi
except ImportError:
print('* glibc: CFFI required!')
else:
ffi = cffi.FFI()
ffi.cdef('const char *gnu_get_libc_version(void);')
C = ffi.dlopen(None)
print('* glibc:', ffi.string(C.gnu_get_libc_version()).decode())
async def test(url):
print('Connection:', url)
async with aiopg.sa.create_engine(url) as engine:
print('Engine:', engine)
loop = asyncio.get_event_loop()
loop.run_until_complete(test(DATABASE_URL))
As I understand it, the problem occurs on Python> = 3.7 and Glibc 2.33. Arch Linux
* uname: Linux a13 5.12.9-arch1-1 #1 SMP PREEMPT Thu, 03 Jun 2021 11:36:13 +0000 x86_64
* python: 3.9.5 (default, May 24 2021, 12:50:35) [GCC 11.1.0]
* asiopg: 1.2.1, Python 3.9.5 (default, May 24 2021, 12:50:35) [GCC 11.1.0]
* psycopg2: 2.8.6 (dt dec pq3 ext lo64)
* glibc: 2.33
Connection: postgresql://localhost:12345/fake
Traceback (most recent call last):
File "/home/bw/.local/lib/python3.9/site-packages/aiopg/connection.py", line 104, in _ready
state = self._conn.poll()
psycopg2.OperationalError: could not connect to server: Connection refused
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 12345?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 12345?
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
...
File "/home/bw/.local/lib/python3.9/site-packages/aiopg/connection.py", line 124, in _ready
self._loop.remove_writer(self._fileno)
File "/usr/lib/python3.9/asyncio/selector_events.py", line 351, in remove_writer
return self._remove_writer(fd)
File "/usr/lib/python3.9/asyncio/selector_events.py", line 325, in _remove_writer
self._selector.modify(fd, mask, (reader, None))
File "/usr/lib/python3.9/selectors.py", line 390, in modify
self._selector.modify(key.fd, selector_events)
FileNotFoundError: [Errno 2] No such file or directory
* uname: Linux a13 5.12.9-arch1-1 #1 SMP PREEMPT Thu, 03 Jun 2021 11:36:13 +0000 x86_64
* python: 3.7.10 (default, May 14 2021, 23:54:07) [GCC 10.2.0]
* asiopg: 1.2.1, Python 3.7.10 (default, May 14 2021, 23:54:07) [GCC 10.2.0]
* psycopg2: 2.8.6 (dt dec pq3 ext lo64)
* glibc: 2.33
...
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 12345?
...
FileNotFoundError: [Errno 2] No such file or directory
* uname: Linux a13 5.12.9-arch1-1 #1 SMP PREEMPT Thu, 03 Jun 2021 11:36:13 +0000 x86_64
* python: 3.6.13 (default, Jun 7 2021, 17:51:57) [GCC 11.1.0]
* asiopg: 1.2.1, Python 3.6.13 (default, Jun 7 2021, 17:51:57) [GCC 11.1.0]
* psycopg2: 2.8.6 (dt dec pq3 ext lo64)
* glibc: 2.33
...
File "/home/bw/src/.venv-3.6/lib/python3.6/site-packages/aiopg/connection.py", line 104, in _ready
state = self._conn.poll()
psycopg2.OperationalError: could not connect to server: Connection refused
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 12345?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 12345?
Devuan GNU/Linux 3 (beowulf)
* uname: Linux d69 5.3.18-lp152.57-default #1 SMP Fri Dec 4 07:27:58 UTC 2020 (7be5551) x86_64
* python: 3.7.3 (default, Jul 25 2020, 13:03:44) [GCC 8.3.0]
* asiopg: 1.2.1, Python 3.7.3 (default, Jul 25 2020, 13:03:44) [GCC 8.3.0]
* psycopg2: 2.8.6 (dt dec pq3 ext lo64)
* glibc: 2.28
...
File "/home/bw/.local/lib/python3.7/site-packages/aiopg/connection.py", line 104, in _ready
state = self._conn.poll()
psycopg2.OperationalError: could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 12345?
CRUX 3.6.1
* uname: Linux crux 5.4.80-bw2 #2 SMP Sat Jun 5 18:11:50 UTC 2021 x86_64
* python: 3.9.0 (default, Dec 6 2020, 03:55:43) [GCC 10.2.0]
* asiopg: 1.2.1, Python 3.9.0 (default, Dec 6 2020, 03:55:43) [GCC 10.2.0]
* psycopg2: 2.8.6 (dt dec pq3 ext lo64)
* glibc: 2.32
...
File "/home/bw/.local/lib/python3.9/site-packages/aiopg/connection.py", line 104, in _ready
state = self._conn.poll()
psycopg2.OperationalError: could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 12345?
Etc.
Some kind of workaround is to monkey patch selectors 🐒
# This import should be at the top of the file because we need to apply monkey patch
# before executing any other code.
# We want to revert this change: https://github.com/python/cpython/pull/1030
# Additional context is here: https://github.com/aio-libs/aiopg/issues/837
import selectors # isort:skip # noqa: F401
selectors._PollLikeSelector.modify = ( # type: ignore
selectors._BaseSelectorImpl.modify # type: ignore
) # noqa: E402
FWIW I also bumped into this and while experimenting wrote draft patch that works at https://github.com/arssher/aiopg/tree/handle_changed_socket But it is not decent enough to be proposed as PR.
Any progress on this? This completely breaks support for hot standby scenarios.
Reliably reproducible by setting up localhost:5431 as a read-only replica of localhost:5432 and using the following minimal example:
import aiopg
import asyncio
connstr = 'postgres://postgres@localhost:5431,localhost:5432/?target_session_attrs=primary'
async def go():
await aiopg.connect(connstr)
asyncio.run(go())
I was getting this same error:
self._context.run(self._callback, *self._args)
File "/srv/reference/venv/lib/python3.9/site-packages/aiopg/connection.py", line 837, in _ready
self._loop.add_writer(
File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 341, in add_writer
self._add_writer(fd, callback, *args)
File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 299, in _add_writer
self._selector.modify(fd, mask | selectors.EVENT_WRITE,
File "/usr/local/lib/python3.9/selectors.py", line 390, in modify
self._selector.modify(key.fd, selector_events)
FileNotFoundError: [Errno 2] No such file or directory
as the first option on google for this error is this issue, I thought it best to comment here what my error was, as it may help others. I had changed the DB password, but I forgot to change my .env file.
As the error message says FileNotFoundError, it took me a few hours to realize that I was trying to connect with a wrong password
Any progress on this? This completely breaks support for hot standby scenarios.
Reliably reproducible by setting up
localhost:5431as a read-only replica oflocalhost:5432and using the following minimal example:import aiopg import asyncio connstr = 'postgres://postgres@localhost:5431,localhost:5432/?target_session_attrs=primary' async def go(): await aiopg.connect(connstr) asyncio.run(go())
@valderman Have you found any solution to this? failover does not work for me as well.