aiopg icon indicating copy to clipboard operation
aiopg copied to clipboard

Using "localhost" as host rather than "127.0.0.1 blocks connections

Open mikn opened this issue 8 years ago • 24 comments

Using the example code from the documentation:

import asyncio
import aiopg

dsn = 'dbname=aiopg user=aiopg password=passwd host=localhost'

async def go():
    async with aiopg.create_pool(dsn) as pool:
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                await cur.execute("SELECT 1")
                ret = []
                async for row in cur:
                    ret.append(row)
                assert ret == [(1,)]

loop = asyncio.get_event_loop()
loop.run_until_complete(go())

Results in this:

$ time python pgtest.py 
Traceback (most recent call last):
  File "pgtest.py", line 17, in <module>
    loop.run_until_complete(go())
  File "/usr/lib/python3.5/asyncio/base_events.py", line 466, in run_until_complete
    return future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "pgtest.py", line 7, in go
    async with aiopg.create_pool(dsn) as pool:
  File "/home/mikaelk/venvs/csc-backend/lib/python3.5/site-packages/aiopg/utils.py", line 77, in __aenter__
    self._obj = yield from self._coro
  File "/home/mikaelk/venvs/csc-backend/lib/python3.5/site-packages/aiopg/pool.py", line 46, in _create_pool
    yield from pool._fill_free_pool(False)
  File "/home/mikaelk/venvs/csc-backend/lib/python3.5/site-packages/aiopg/pool.py", line 203, in _fill_free_pool
    **self._conn_kwargs)
  File "/home/mikaelk/venvs/csc-backend/lib/python3.5/site-packages/aiopg/utils.py", line 67, in __iter__
    resp = yield from self._coro
  File "/home/mikaelk/venvs/csc-backend/lib/python3.5/site-packages/aiopg/connection.py", line 74, in _connect
    yield from conn._poll(waiter, timeout)
  File "/home/mikaelk/venvs/csc-backend/lib/python3.5/site-packages/aiopg/connection.py", line 239, in _poll
    yield from asyncio.shield(cancel(), loop=self._loop)
  File "/usr/lib/python3.5/asyncio/futures.py", line 380, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
    result = coro.send(None)
  File "/home/mikaelk/venvs/csc-backend/lib/python3.5/site-packages/aiopg/connection.py", line 225, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway

real	1m0.133s
user	0m0.104s
sys	0m0.012s

If I change it to "127.0.0.1" it works perfectly, however.

mikn avatar Feb 10 '17 12:02 mikn

Don't know why DNS resolving doesn't work on your box. Honestly I don't want to change example -- localhost should work even on IPv6 system.

asvetlov avatar Feb 13 '17 03:02 asvetlov

Got the same problem. My settings:

  • VirtualBox is run on macOS 10.12.6.
  • Ubuntu 17.10 is run on VirtualBox VM.
  • PostgreSQL Server is run under the virtual machine. The server is run with nearly default settings, the only changes are:
    • postgresql.conf: listen_addresses = '*';
    • pg_hba.conf: added line host all all 10.0.2.0/24 md5.
  • Port forwarding: host:5432 -> guest:5432.

What do I have:

  • Using your example code with 'localhost' fails (after default timeout — 60.0 seconds) with the same exception as @mikn mentioned.
  • Changing host to 127.0.0.1 resolves the problem.
  • Attempt to connect to PG server run on virtual machine from PyCharm database client is successful (host is localhost, see screenshot below).

2018-09-09 12 56 36

It seems to be rather a bug than a reason to update documentation.

bryzgaloff avatar Sep 09 '18 10:09 bryzgaloff

Check your /etc/hosts and maybe dig localhost

webknjaz avatar Sep 09 '18 16:09 webknjaz

Same problem. Steps to reproduce

host localhost
localhost has address 127.0.0.1
localhost has IPv6 address ::1

start listen only ipv4 ncat -4 -v -v -l 127.0.0.1

then try to call aiopg.connect with host=localhost, port=<listening_port> after timeout psycopg2.OperationalError: asynchronous connection attempt underway will be raised

errx avatar Jun 06 '19 14:06 errx

Did you try to connect using bare psycopg2? I suspect you'll get the same result as aoipg: aiopg has no connection code but the library calls psycopg2 to do the job.

asvetlov avatar Jun 06 '19 14:06 asvetlov

If I add and call "wait" function from async psycopg example (http://initd.org/psycopg/docs/advanced.html#asynchronous-support) everything works ok

errx avatar Jun 06 '19 14:06 errx

Interesting

asvetlov avatar Jun 06 '19 14:06 asvetlov

@errx and what if you use ::1?

webknjaz avatar Jun 06 '19 15:06 webknjaz

listening on ::1 and connecting to localhost? works

errx avatar Jun 06 '19 15:06 errx

minimal example

asyncio.get_event_loop().run_until_complete(aiopg.connect("host=localhost port=12345"))

errx avatar Jun 06 '19 15:06 errx

on state == POLL_WRITE always removing old writer and adding new fixes this problem

errx avatar Jun 06 '19 16:06 errx

@errx I guess that remove-and-readd is required only if fileno() is changed. Anyway, thank you very much! Would you provide a PR? Should be easy now, after your investigation.

It would be really very helpful if you write a PR and check it yourself. I have not idea how to reproduce it yet in the test suite.

asvetlov avatar Jun 06 '19 17:06 asvetlov

I guess the key point to reproduce this issue is to have 127.0.0.1 localhost and ::1 localhost in /etc/hosts (it's set by default in osx and debian for example but not in ubuntu 18.04)

I'll try to make a PR as soon as I understand what's really going on here :)

errx avatar Jun 06 '19 20:06 errx

My hypothesis (based on tcpdump and strace) is that after first failed ipv6 connection socket is closed and than new socket is created with same fd (probably inside libpq? didn't check yet) and according to epoll man page closed descriptors will be removed from interest lists.

But I'm not sure how to make a good solution for this problem

errx avatar Jun 06 '19 23:06 errx

SO_REUSEADDR?

webknjaz avatar Jun 06 '19 23:06 webknjaz

I'm sorry but I'm not sure how this flag will help, care to elaborate?

errx avatar Jun 07 '19 05:06 errx

Sometimes after an attempt to listen to some port, it becomes unbindable for a while until that times out in the kernel.

webknjaz avatar Jun 07 '19 22:06 webknjaz

@errx can you confirm that fd is not changed?

asvetlov avatar Jun 07 '19 22:06 asvetlov

Yes, fd didn't change but I guess this is not always true. I can show strace logs but unfortunately not today

errx avatar Jun 08 '19 05:06 errx

@asvetlov can you explain a little bit why did you add reading/writing logic split here https://github.com/aio-libs/aiopg/commit/1ef6f9402325550d526795d54dc45932cc9530b5? Is it for optimization?

I guess one possible solution is: update fileno if changed always remove/add reader and writer after _conn.poll() call (mb this is not optimal by we can assume that connection phase is short)

btw: I've found similar issue with sslmode https://stackoverflow.com/questions/35184780/postgresql-connect-asynchronously-with-epoll-wait

errx avatar Jun 10 '19 11:06 errx

Plz look at my PR #579

It fixes issues when:

  • you have multiple ips in DNS and first hosts is not responding
  • you have multiple hosts in your connection string and there is unavailable replica before your target host
  • replica before your target host is not responding at all, even on SYN packet, and you're getting stuck in poll

r-dmv avatar Jul 11 '19 21:07 r-dmv

Would you provide a PR?

Plz look at my PR #579

Hi! Could someone please look the PR?

oleksandr-kuzmenko avatar Nov 05 '20 12:11 oleksandr-kuzmenko

Friendly reminder about #579. Merging it could be a great step forward. Do you guys need any help with it?

We at my work have met an issue with aiopg being not able to connect to PostgreSQL cluster when the replica (read-only one) comes before the leader (read-write one) in the DSN with target_session_attrs=read-write. It just stucks and then times out. Pure psycopg2 works just fine in the same situation. And the issue is easily reproducible. Seems like the logic about refreshing the underlying fd in #579 fixes the issue.

and-semakin avatar Jul 08 '21 17:07 and-semakin