piccolo cockroachDB extremely long query due to Array(UUID())

hello all,

Assume we have a simple Post table:

from piccolo.columns import *

class Post(Table):
    uuid = UUID(primary_key=True)
    user = ForeignKey(User, null=False)
    description = Text()
    users_mentioned = Array(UUID())

When I perform a Post.objects().run_sync() or Post.select(Post.all_columns()).run_sync() the query is extremely slow.

But when I leave out querying users_mentioned: Array(UUID()) such as Post.select(Post.uuid).run_sync() the query speeds are as expected (empty table).

From looking at my cluster, it performs a very complex query (10s+) each time I request for the Array(UUID()) datatype:

WITH RECURSIVE typeinfo_tree (oid, ns, name, kind, basetype, elemtype, elemdelim, range_subtype, attrtypoids,
attrnames, depth) AS (SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype, ti.elemdelim,
ti.range_subtype, ti.attrtypoids, ti.attrnames, _
 FROM (SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = '_' THEN (WITH RECURSIVE typebases (oid,
depth) AS (SELECT t2.typbasetype AS oid, _ AS depth
 FROM pg_type AS t2
   WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + _ AS depth
 FROM pg_type AS t2, typebases AS tb
   WHERE (tb.oid = t2.oid)
    AND (t2.typbasetype != _)) SELECT oid
 FROM typebases
 ORDER BY depth DESC
 LIMIT _) ELSE _ END) AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype,
(CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.atttypid
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrtypoids, (CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.attname::STRING
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrnames
 FROM pg_catalog.pg_type AS t INNER
   JOIN pg_catalog.pg_namespace AS ns
    ON (ns.oid = t.typnamespace) LEFT
   JOIN pg_type AS elem_t
    ON (((t.typlen = _)
    AND (t.typelem != _))
    AND (t.typelem = elem_t.oid)) LEFT
   JOIN pg_range AS range_t
    ON (t.oid = range_t.rngtypid)) AS ti
   WHERE ti.oid = ANY ($1::OID[]) UNION ALL SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype,
ti.elemdelim, ti.range_subtype, ti.attrtypoids, ti.attrnames, tt.depth + _
 FROM (SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = '_' THEN (WITH RECURSIVE typebases (oid,
depth) AS (SELECT t2.typbasetype AS oid, _ AS depth
 FROM pg_type AS t2
   WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + _ AS depth
 FROM pg_type AS t2, typebases AS tb
   WHERE (tb.oid = t2.oid)
    AND (t2.typbasetype != _)) SELECT oid
 FROM typebases
 ORDER BY depth DESC
 LIMIT _) ELSE _ END) AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype,
(CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.atttypid
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrtypoids, (CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.attname::STRING
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrnames
 FROM pg_catalog.pg_type AS t INNER
   JOIN pg_catalog.pg_namespace AS ns
    ON (ns.oid = t.typnamespace) LEFT
   JOIN pg_type AS elem_t
    ON (((t.typlen = _)
    AND (t.typelem != _))
    AND (t.typelem = elem_t.oid)) LEFT
   JOIN pg_range AS range_t
    ON (t.oid = range_t.rngtypid)) AS ti, typeinfo_tree AS tt
   WHERE ((((tt.elemtype IS NOT NULL)
    AND (ti.oid = tt.elemtype)) OR ((tt.attrtypoids IS NOT NULL)
    AND (ti.oid = ANY (tt.attrtypoids)))) OR ((tt.range_subtype IS NOT NULL)
    AND (ti.oid = tt.range_subtype))) OR ((tt.basetype IS NOT NULL)
    AND (ti.oid = tt.basetype))) SELECT DISTINCT *, basetype::REGTYPE::STRING AS basetype_name, elemtype::REGTYPE::STRING AS elemtype_name,
range_subtype::REGTYPE::STRING AS range_subtype_name
 FROM typeinfo_tree
 ORDER BY depth DESC

Performing a query without piccolo such as:

import psycopg2

connection = psycopg2.connect(**con)
cursor = connection.cursor()
cursor.execute("select * from post")

works fine/speed as intended.

please send help if I am doing anything wrong

Jun 02 '24 17:06 Jacky56

Another interesting note:

swapping Array(UUID()) to Array(Varchar()) would still run the complex query
swapping Array(?) to JSONB would still run the complex query

Jun 02 '24 18:06 Jacky56

@Jacky56 You are doing everything right on the table definition and querying Piccolo. I think the problem is how asynpg works with CockroachDB. There have been several issues online based on this. I have Cockroach v22 on my local machine and I can't even insert data with arrays because the error is unable to decorate subquery (which was resolved in v23). Then I upgrade to the latest version available (v24.1.0) and I get the same as you, a very slow query with a complex query in the cluster log. I don't think we can do anything in Piccolo, Based on this discussion the type introspection is done once and I can confirm that if we use a connection pool only the first query is slow and every other query is fine. psycopg doesn't have those problems but Piccolo is based on asyncpg

Jun 03 '24 08:06 sinisaos

@sinisaos Thanks for looking into this. Hopefully asyncpg will solve this problem in a future version. It's a good reason for us to consider supporting multiple Postgres client libraries. I'm going to try upgrading our CI to use Cockroach v24.

Jun 03 '24 22:06 dantownsend

hello @sinisaos @dantownsend

Is there a way to change psycopg to another sql driver or prerun introspection queries on boot?

im running piccolo on a container and cold starting the service takes criminially long because of what sinisa pointed out

Dec 05 '24 13:12 Jacky56

@Jacky56 Psycopg has different connection, pooling and cursor handling than asyncpg and building a Psycopg engine for Piccolo (we can't easily subclass PostgresEngine) a would not be an easy task. You can try the PSQLPy engine for Piccolo which is in early development and not all columns are covered. I tried a combination of PSQLPy and Cockroach and it works. You can try something like this.

import uuid
from piccolo.table import Table
from piccolo.columns import Array, Text, ForeignKey, UUID, Varchar
from psqlpy_piccolo import PSQLPyEngine


DB = PSQLPyEngine(
    config={
        "host": "localhost",
        "database": "piccolo",
        "user": "root",
        "password": "",
        "port": 26257,
    }
)


def generate_uuid():
    return str(uuid.uuid4())


class BaseUser(Table, db=DB):
    uuid = UUID(primary_key=True, default=generate_uuid)
    username = Varchar()


class Post(Table, db=DB):
    uuid = UUID(primary_key=True, default=generate_uuid)
    post_user = ForeignKey(BaseUser, null=False)
    description = Text()
    users_mentioned = Array(base_column=UUID(default=generate_uuid))


if __name__ == "__main__":
    results = Post.objects().run_sync()
    print(results)
    related_results = Post.select(Post.all_columns()).run_sync()
    print(related_results)

Execution is instant. I hope that makes sense and you can use it in your case.

Dec 05 '24 19:12 sinisaos

Hello @sinisaos

I have checked out PSQLlPy and psqlpy-piccolo wrapper, it looks promising but in its very infant stage of development: e.g major issues datatype comparisons

I have found a workaround for now using https://github.com/MagicStack/asyncpg/issues/413#issue-418853439 https://github.com/MagicStack/asyncpg/discussions/1015#discussioncomment-9646970 which effectively 'hard codes' the unknown datatypes so introspection does not occur

from piccolo.engine.cockroach import CockroachEngine

config = {} # some config

DB = CockroachEngine(
  **config
)

# a method that gathers info of the datatype and adds it to the codec
async def set_type_codec(conn, typenames):
    schema='pg_catalog'
    format='text'
    conn._check_open()
    for typename in typenames:
        typeinfo = await conn.fetchrow(asyncpg.introspection.TYPE_BY_NAME, typename, schema)
        # print("typeinfo:", typeinfo)
        if not typeinfo:
            raise ValueError('unknown type: {}.{}'.format(schema, typename))

    oid = typeinfo['oid']
    conn._protocol.get_settings().add_python_codec(
    oid, typename, schema, [], 'scalar',
    lambda a: a, lambda a: a, format)


async def main():
    conn = await DB.get_new_connection()
    await set_type_codec(conn, ["jsonb", "_tsvector", ..., "float8"])

Now the issue lies in conn._protocol.get_settings().add_python_codec, because i dont know what the parameters are suppose to be for most cases.

This does solve the cold start issues for my containers.

Feb 14 '25 21:02 Jacky56

@Jacky56 That's very interesting - thanks for sharing. I was unaware you could do things like that with asyncpg.

Feb 14 '25 22:02 dantownsend

@Jacky56 Glad you found a solution for your use case. I tried that but in my case (on local machine) even with your workaround introspection it takes a long time. But that doesn't matter, what matters is that everything works fine for you. There are no such problems with pysqlpy, but the main problem with that engine (as I already wrote) is that not all columns are covered and there is not much activity in the repo. Cheers.

Feb 15 '25 07:02 sinisaos