asyncpg icon indicating copy to clipboard operation
asyncpg copied to clipboard

Slow introspection when using multiple custom types in a query

Open tristan opened this issue 5 years ago • 14 comments

  • asyncpg version: 0.20.1
  • PostgreSQL version: 12.1
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce the issue with a local PostgreSQL install?: postgresql is installed locally on archlinux
  • Python version: 3.7.6 and 3.8.1
  • Platform: Linux 5.4.13-arch1-1
  • Do you use pgbouncer?: no
  • Did you install asyncpg with pip?: yes
  • If you built asyncpg locally, which version of Cython did you use?: N/A
  • Can the issue be reproduced under both asyncio and uvloop?: yes

I have a few custom types (CREATE TYPE ...) in my database, and I'm running into issues where asyncpg's introspection stage on queries using 2 or more of these types are taking > 1 second to complete.

e.g.

add log_min_duration_statement = 500 to the default postgresql.conf

Create a database test with schema:

CREATE TYPE FOURBIGINTS AS (i0 BIGINT, i1 BIGINT, i2 BIGINT, i3 BIGINT);
CREATE TYPE NUMBERS_AS_WORDS AS ENUM (
    'zero', 'one', 'two', 'three', 'four'
);
CREATE TABLE bigthings (
    thing_id BIGSERIAL PRIMARY KEY,
    num FOURBIGINTS,
    words NUMBERS_AS_WORDS
);

and run the following code:

import asyncio
import asyncpg

async def main():
    con_args = {
        'dsn': 'postgres:///test',
        'ssl': None,
        'min_size': 10,
        'max_size': 10
    }
    pool = await asyncpg.create_pool(**con_args)

    async with pool.acquire() as con:
        await con.execute(
            """\
        INSERT INTO bigthings
        (num, words)
        VALUES ($1::FOURBIGINTS, $2::NUMBERS_AS_WORDS)
        """,
            (0, 0, 0, 0), 'one')

asyncio.run(main())

watching the logs (on my system sudo journalctl -u postgresql -f, will show something like:

2020-01-21 14:45:49.066 CET [118544] LOG:  duration: 1593.511 ms  execute __asyncpg_stmt_2__: WITH RECURSIVE typeinfo_tree(
            oid, ns, name, kind, basetype, has_bin_io, elemtype, elemdelim,
            range_subtype, elem_has_bin_io, attrtypoids, attrnames, depth)
        AS (
            SELECT
...
2020-01-21 14:45:49.066 CET [118544] DETAIL:  parameters: $1 = '{16584,16582}'

I've traced this back to the call to _introspect_types in connection.py.

From a bit of testing, it only happens if there are multiple custom types used in the query. e.g. if i change the query to simply be INSERT INTO bigthings (num) VALUES ($1::FOURBIGINTS), then everything is nice and fast as expected, or if i change the bigthings.words column to a VARCHAR, then there is no problem. But as soon as I include two or more custom types (e.g. 2 enum types or 2 tuple types, or a mix) then I see the slow downs.

Is there anything I can do to either remove the need for this introspection (e.g. giving asyncpg some hints about these types), or maybe I'm doing something wrong that I can correct?

tristan avatar Jan 21 '20 14:01 tristan

This is most likely the PostgreSQL's new JIT that is slow on that query. Try turning it off (jit = off).

elprans avatar Jan 21 '20 16:01 elprans

Thanks @elprans that does seem to be the issue.

adding jit = off to the postgresql.conf made everything fast again.

Since I can't change the postgresql.conf in my deployed environment right now, I've worked around the issue in my own code for now by adding jit: off to the server_settings when i create my pool:

await asyncpg.create_pool(..., server_settings={'jit': 'off'})

tristan avatar Jan 22 '20 13:01 tristan

I've been also bitten by this bug. Executing a simple query with name = ANY($1::varchar[]) triggers asyncpg to execute that huge introspection query. In some random cases, this query took seconds and even caused a command timeout. I've been using this library without a pool, connection/disconnecting as needed.

Postgresql 10 and 9.4 -- experienced timeouts only on 9.4.

One more thing: it seems asyncpg doesn't use prepared queries for

SELECT
            t.oid,
            t.typelem     AS elemtype,
            t.typtype     AS kind
        FROM
            pg_catalog.pg_type AS t
            INNER JOIN pg_catalog.pg_namespace ns ON (ns.oid = t.typnamespace)
        WHERE
            t.typname = $1 AND ns.nspname = $2

I see this in logs repeating for each type encoder/decoder registration.

dmig-alarstudios avatar Jan 30 '20 09:01 dmig-alarstudios

Hi, @elprans

WITH RECURSIVE typeinfo_tree(... I would like to know the purpose for which the query is being issued, could you tell me?

I've been having trouble with this query because it's too slow, so if I turned off JIT as you said, the query was faster. But what can I do to stop them from issuing the query? What part of the SQL is causing issue the query?

otsuka avatar May 15 '20 15:05 otsuka

I would like to know the purpose for which the query is being issued, could you tell me?

Postgres does not send enough information to describe query output at the protocol level. In order to decode complex data types, like composite types, ranges, arrays, etc, asyncpg needs to introspect those types (recursively) to determine what they are. Normally, the introspection is done once per connection, so if you use a pool you only pay the cost once.

elprans avatar May 15 '20 16:05 elprans

I see this in logs repeating for each type encoder/decoder registration.

That shouldn't be the case. That query is executed using regular fetchrow() calls. The fact that you see the query in the log does not mean that a prepared statement wasn't used.

elprans avatar May 15 '20 16:05 elprans

Thank you very much for your answer, @elprans

otsuka avatar May 16 '20 09:05 otsuka

I have this issue with a table containing a single ENUM column:

CREATE TYPE index_type_t AS ENUM ('mark, 'index');

Would it be possible to cache the introspection information at the whole pool level, and not on each individual connection?

2-5 avatar Nov 06 '20 21:11 2-5

To turn of the jit with sqlalchemy 1.4.0b asyncpg …

         self.engine = create_async_engine(
-            connection_string, pool_size=pool_size, **kwargs
+            connection_string, pool_size=pool_size, connect_args={'server_settings':{'jit': 'off'}}, **kwargs
         )

commonism avatar Dec 28 '20 18:12 commonism

or if you're using GINO

db = Gino(
    dsn=config.DB_DSN,
    pool_min_size=config.DB_POOL_MIN_SIZE,
    pool_max_size=config.DB_POOL_MAX_SIZE,
    echo=config.DB_ECHO,
    ssl=config.DB_SSL,
    use_connection_for_request=config.DB_USE_CONNECTION_FOR_REQUEST,
    retry_limit=config.DB_RETRY_LIMIT,
    retry_interval=config.DB_RETRY_INTERVAL,
    kwargs={'server_settings': {'jit': 'off'}}
)

liorchen avatar Dec 28 '20 21:12 liorchen

  1. If I use connect_args={'server_settings':{'jit': 'off'}} with connection pool, does first query will generate WITH RECURSIVE typeinfo_tree( oid, ns, name, kind, basetype, has_bin_io, elemtype, elemdelim, range_subtype, elem_has_bin_io, attrtypoids, attrnames, depth) AS ( SELECT ...?

  2. I tried disable jit on postgres side but my query again execute after WITH RECURSIVE typeinfo_tree( oid, ns, name, kind, basetype, has_bin_io, elemtype, elemdelim, range_subtype, elem_has_bin_io, attrtypoids, attrnames, depth) AS ( SELECT ..., how to completely disable jit?

mukhamux avatar Sep 14 '22 11:09 mukhamux

how to completely disable jit?

connect(..., server_settings={"jit": "off"})

elprans avatar Sep 14 '22 17:09 elprans

@elprans this doesn't help, every first query i get WITH RECURSIVE typeinfo_tree( oid, ns, name, kind, basetype, has_bin_io, elemtype, elemdelim, range_subtype, elem_has_bin_io, attrtypoids, attrnames, depth) AS ( SELECT ... I use it in sqlalchemy with:

        engine = create_async_engine(
            url,
            json_serializer=simplejson.dumps,
            json_deserializer=json_deserializer,
            connect_args={'server_settings': {'jit': 'off'}},
            **kwargs,
        )

mukhamux avatar Sep 15 '22 14:09 mukhamux

@elprans this doesn't help, every first query i get WITH RECURSIVE typeinfo_tree( oid, ns, name, kind, basetype, has_bin_io, elemtype, elemdelim, range_subtype, elem_has_bin_io, attrtypoids, attrnames, depth) AS ( SELECT ... I use it in sqlalchemy with:

        engine = create_async_engine(
            url,
            json_serializer=simplejson.dumps,
            json_deserializer=json_deserializer,
            connect_args={'server_settings': {'jit': 'off'}},
            **kwargs,
        )

look like any case doesn't work if you don't make await conn.run_sync(Base.metadata.create_all) after create async engine can someone tell me is it true or not?

mukhamux avatar Sep 18 '22 11:09 mukhamux

Did somebody bug report to PG maillists?

stalkerg avatar Oct 28 '22 06:10 stalkerg

Is it possible to perform introspection manually? Without waiting for the query to be performed? Right now we need to manually execute the query in order to cache introspection. jit = off does help with better performance of the introspection queries, but still there's a significant delay for initial query.

stevanmilic avatar Feb 06 '23 16:02 stevanmilic

@stevanmilic agree it will be good to have a way to provide a static schema for asyncpg to avoid such runtime overhead.

stalkerg avatar Feb 07 '23 03:02 stalkerg

@elprans is it possible to execute SQL query without going through the prepare statement flow? It's rather a blocker for high freq apps to use asyncpg with enums if you must go through introspection, which is really slow.

Something like execute function on connection, but that would return results.

stevanmilic avatar Feb 09 '23 17:02 stevanmilic

I worked around this problem by registering all enums on connection creation:

ENUM_TYPE = (
    ("public", "order_type_t"),
    ("orders", "user_type_t"),
)

async def register_enum_types(conn):
    for schema, typename in ENUM_TYPE:
        await conn.set_builtin_type_codec(typename, schema=schema, codec_name="text")

pool = await asyncpg.create_pool(..., init=register_enum_types)

2-5 avatar Feb 09 '23 18:02 2-5

I have a WIP patch to automatically disable JIT for the introspection query.

is it possible to execute SQL query without going through the prepare statement flow?

Yes, via execute() and without any arguments. Not sure if useful in your use case.

elprans avatar Feb 13 '23 20:02 elprans

@elprans , I see that you have added disabling jit in the function _introspect_types in PR #1082 . Shouldn't it be done to the function below it as well?

saikumar1752 avatar Feb 16 '24 10:02 saikumar1752

No, introspect_type is a very simple query and is not affected by the JIT pathology.

elprans avatar Feb 16 '24 16:02 elprans