piccolo icon indicating copy to clipboard operation
piccolo copied to clipboard

cockroachDB extremely long query due to Array(UUID())

Open Jacky56 opened this issue 1 year ago • 8 comments

hello all,

Assume we have a simple Post table:

from piccolo.columns import *

class Post(Table):
    uuid = UUID(primary_key=True)
    user = ForeignKey(User, null=False)
    description = Text()
    users_mentioned = Array(UUID())

When I perform a Post.objects().run_sync() or Post.select(Post.all_columns()).run_sync() the query is extremely slow.

But when I leave out querying users_mentioned: Array(UUID()) such as Post.select(Post.uuid).run_sync() the query speeds are as expected (empty table).

From looking at my cluster, it performs a very complex query (10s+) each time I request for the Array(UUID()) datatype:

WITH RECURSIVE typeinfo_tree (oid, ns, name, kind, basetype, elemtype, elemdelim, range_subtype, attrtypoids,
attrnames, depth) AS (SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype, ti.elemdelim,
ti.range_subtype, ti.attrtypoids, ti.attrnames, _
 FROM (SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = '_' THEN (WITH RECURSIVE typebases (oid,
depth) AS (SELECT t2.typbasetype AS oid, _ AS depth
 FROM pg_type AS t2
   WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + _ AS depth
 FROM pg_type AS t2, typebases AS tb
   WHERE (tb.oid = t2.oid)
    AND (t2.typbasetype != _)) SELECT oid
 FROM typebases
 ORDER BY depth DESC
 LIMIT _) ELSE _ END) AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype,
(CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.atttypid
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrtypoids, (CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.attname::STRING
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrnames
 FROM pg_catalog.pg_type AS t INNER
   JOIN pg_catalog.pg_namespace AS ns
    ON (ns.oid = t.typnamespace) LEFT
   JOIN pg_type AS elem_t
    ON (((t.typlen = _)
    AND (t.typelem != _))
    AND (t.typelem = elem_t.oid)) LEFT
   JOIN pg_range AS range_t
    ON (t.oid = range_t.rngtypid)) AS ti
   WHERE ti.oid = ANY ($1::OID[]) UNION ALL SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype,
ti.elemdelim, ti.range_subtype, ti.attrtypoids, ti.attrnames, tt.depth + _
 FROM (SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = '_' THEN (WITH RECURSIVE typebases (oid,
depth) AS (SELECT t2.typbasetype AS oid, _ AS depth
 FROM pg_type AS t2
   WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + _ AS depth
 FROM pg_type AS t2, typebases AS tb
   WHERE (tb.oid = t2.oid)
    AND (t2.typbasetype != _)) SELECT oid
 FROM typebases
 ORDER BY depth DESC
 LIMIT _) ELSE _ END) AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype,
(CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.atttypid
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrtypoids, (CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.attname::STRING
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrnames
 FROM pg_catalog.pg_type AS t INNER
   JOIN pg_catalog.pg_namespace AS ns
    ON (ns.oid = t.typnamespace) LEFT
   JOIN pg_type AS elem_t
    ON (((t.typlen = _)
    AND (t.typelem != _))
    AND (t.typelem = elem_t.oid)) LEFT
   JOIN pg_range AS range_t
    ON (t.oid = range_t.rngtypid)) AS ti, typeinfo_tree AS tt
   WHERE ((((tt.elemtype IS NOT NULL)
    AND (ti.oid = tt.elemtype)) OR ((tt.attrtypoids IS NOT NULL)
    AND (ti.oid = ANY (tt.attrtypoids)))) OR ((tt.range_subtype IS NOT NULL)
    AND (ti.oid = tt.range_subtype))) OR ((tt.basetype IS NOT NULL)
    AND (ti.oid = tt.basetype))) SELECT DISTINCT *, basetype::REGTYPE::STRING AS basetype_name, elemtype::REGTYPE::STRING AS elemtype_name,
range_subtype::REGTYPE::STRING AS range_subtype_name
 FROM typeinfo_tree
 ORDER BY depth DESC

Performing a query without piccolo such as:

import psycopg2

connection = psycopg2.connect(**con)
cursor = connection.cursor()
cursor.execute("select * from post")

works fine/speed as intended.

please send help if I am doing anything wrong

Jacky56 avatar Jun 02 '24 17:06 Jacky56

Another interesting note:

  • swapping Array(UUID()) to Array(Varchar()) would still run the complex query
  • swapping Array(?) to JSONB would still run the complex query

Jacky56 avatar Jun 02 '24 18:06 Jacky56

@Jacky56 You are doing everything right on the table definition and querying Piccolo. I think the problem is how asynpg works with CockroachDB. There have been several issues online based on this. I have Cockroach v22 on my local machine and I can't even insert data with arrays because the error is unable to decorate subquery (which was resolved in v23). Then I upgrade to the latest version available (v24.1.0) and I get the same as you, a very slow query with a complex query in the cluster log. I don't think we can do anything in Piccolo, Based on this discussion the type introspection is done once and I can confirm that if we use a connection pool only the first query is slow and every other query is fine. psycopg doesn't have those problems but Piccolo is based on asyncpg

sinisaos avatar Jun 03 '24 08:06 sinisaos

@sinisaos Thanks for looking into this. Hopefully asyncpg will solve this problem in a future version. It's a good reason for us to consider supporting multiple Postgres client libraries. I'm going to try upgrading our CI to use Cockroach v24.

dantownsend avatar Jun 03 '24 22:06 dantownsend

hello @sinisaos @dantownsend

Is there a way to change psycopg to another sql driver or prerun introspection queries on boot?

im running piccolo on a container and cold starting the service takes criminially long because of what sinisa pointed out

Jacky56 avatar Dec 05 '24 13:12 Jacky56

@Jacky56 Psycopg has different connection, pooling and cursor handling than asyncpg and building a Psycopg engine for Piccolo (we can't easily subclass PostgresEngine) a would not be an easy task. You can try the PSQLPy engine for Piccolo which is in early development and not all columns are covered. I tried a combination of PSQLPy and Cockroach and it works. You can try something like this.

import uuid
from piccolo.table import Table
from piccolo.columns import Array, Text, ForeignKey, UUID, Varchar
from psqlpy_piccolo import PSQLPyEngine


DB = PSQLPyEngine(
    config={
        "host": "localhost",
        "database": "piccolo",
        "user": "root",
        "password": "",
        "port": 26257,
    }
)


def generate_uuid():
    return str(uuid.uuid4())


class BaseUser(Table, db=DB):
    uuid = UUID(primary_key=True, default=generate_uuid)
    username = Varchar()


class Post(Table, db=DB):
    uuid = UUID(primary_key=True, default=generate_uuid)
    post_user = ForeignKey(BaseUser, null=False)
    description = Text()
    users_mentioned = Array(base_column=UUID(default=generate_uuid))


if __name__ == "__main__":
    results = Post.objects().run_sync()
    print(results)
    related_results = Post.select(Post.all_columns()).run_sync()
    print(related_results)

Execution is instant. I hope that makes sense and you can use it in your case.

sinisaos avatar Dec 05 '24 19:12 sinisaos

Hello @sinisaos

I have checked out PSQLlPy and psqlpy-piccolo wrapper, it looks promising but in its very infant stage of development: e.g major issues datatype comparisons

I have found a workaround for now using https://github.com/MagicStack/asyncpg/issues/413#issue-418853439 https://github.com/MagicStack/asyncpg/discussions/1015#discussioncomment-9646970 which effectively 'hard codes' the unknown datatypes so introspection does not occur

from piccolo.engine.cockroach import CockroachEngine

config = {} # some config

DB = CockroachEngine(
  **config
)

# a method that gathers info of the datatype and adds it to the codec
async def set_type_codec(conn, typenames):
    schema='pg_catalog'
    format='text'
    conn._check_open()
    for typename in typenames:
        typeinfo = await conn.fetchrow(asyncpg.introspection.TYPE_BY_NAME, typename, schema)
        # print("typeinfo:", typeinfo)
        if not typeinfo:
            raise ValueError('unknown type: {}.{}'.format(schema, typename))

    oid = typeinfo['oid']
    conn._protocol.get_settings().add_python_codec(
    oid, typename, schema, [], 'scalar',
    lambda a: a, lambda a: a, format)


async def main():
    conn = await DB.get_new_connection()
    await set_type_codec(conn, ["jsonb", "_tsvector", ..., "float8"])

Now the issue lies in conn._protocol.get_settings().add_python_codec, because i dont know what the parameters are suppose to be for most cases.

This does solve the cold start issues for my containers.

Jacky56 avatar Feb 14 '25 21:02 Jacky56

@Jacky56 That's very interesting - thanks for sharing. I was unaware you could do things like that with asyncpg.

dantownsend avatar Feb 14 '25 22:02 dantownsend

@Jacky56 Glad you found a solution for your use case. I tried that but in my case (on local machine) even with your workaround introspection it takes a long time. But that doesn't matter, what matters is that everything works fine for you. There are no such problems with pysqlpy, but the main problem with that engine (as I already wrote) is that not all columns are covered and there is not much activity in the repo. Cheers.

sinisaos avatar Feb 15 '25 07:02 sinisaos