cockroachDB extremely long query due to Array(UUID())
hello all,
Assume we have a simple Post table:
from piccolo.columns import *
class Post(Table):
uuid = UUID(primary_key=True)
user = ForeignKey(User, null=False)
description = Text()
users_mentioned = Array(UUID())
When I perform a Post.objects().run_sync() or Post.select(Post.all_columns()).run_sync() the query is extremely slow.
But when I leave out querying users_mentioned: Array(UUID()) such as Post.select(Post.uuid).run_sync() the query speeds are as expected (empty table).
From looking at my cluster, it performs a very complex query (10s+) each time I request for the Array(UUID()) datatype:
WITH RECURSIVE typeinfo_tree (oid, ns, name, kind, basetype, elemtype, elemdelim, range_subtype, attrtypoids,
attrnames, depth) AS (SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype, ti.elemdelim,
ti.range_subtype, ti.attrtypoids, ti.attrnames, _
FROM (SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = '_' THEN (WITH RECURSIVE typebases (oid,
depth) AS (SELECT t2.typbasetype AS oid, _ AS depth
FROM pg_type AS t2
WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + _ AS depth
FROM pg_type AS t2, typebases AS tb
WHERE (tb.oid = t2.oid)
AND (t2.typbasetype != _)) SELECT oid
FROM typebases
ORDER BY depth DESC
LIMIT _) ELSE _ END) AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype,
(CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.atttypid
ORDER BY ia.attnum)
FROM pg_attribute AS ia INNER
JOIN pg_class AS c
ON (ia.attrelid = c.oid)
WHERE ((ia.attnum > _)
AND (NOT ia.attisdropped))
AND (c.reltype = t.oid)) ELSE _ END) AS attrtypoids, (CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.attname::STRING
ORDER BY ia.attnum)
FROM pg_attribute AS ia INNER
JOIN pg_class AS c
ON (ia.attrelid = c.oid)
WHERE ((ia.attnum > _)
AND (NOT ia.attisdropped))
AND (c.reltype = t.oid)) ELSE _ END) AS attrnames
FROM pg_catalog.pg_type AS t INNER
JOIN pg_catalog.pg_namespace AS ns
ON (ns.oid = t.typnamespace) LEFT
JOIN pg_type AS elem_t
ON (((t.typlen = _)
AND (t.typelem != _))
AND (t.typelem = elem_t.oid)) LEFT
JOIN pg_range AS range_t
ON (t.oid = range_t.rngtypid)) AS ti
WHERE ti.oid = ANY ($1::OID[]) UNION ALL SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype,
ti.elemdelim, ti.range_subtype, ti.attrtypoids, ti.attrnames, tt.depth + _
FROM (SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = '_' THEN (WITH RECURSIVE typebases (oid,
depth) AS (SELECT t2.typbasetype AS oid, _ AS depth
FROM pg_type AS t2
WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + _ AS depth
FROM pg_type AS t2, typebases AS tb
WHERE (tb.oid = t2.oid)
AND (t2.typbasetype != _)) SELECT oid
FROM typebases
ORDER BY depth DESC
LIMIT _) ELSE _ END) AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype,
(CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.atttypid
ORDER BY ia.attnum)
FROM pg_attribute AS ia INNER
JOIN pg_class AS c
ON (ia.attrelid = c.oid)
WHERE ((ia.attnum > _)
AND (NOT ia.attisdropped))
AND (c.reltype = t.oid)) ELSE _ END) AS attrtypoids, (CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.attname::STRING
ORDER BY ia.attnum)
FROM pg_attribute AS ia INNER
JOIN pg_class AS c
ON (ia.attrelid = c.oid)
WHERE ((ia.attnum > _)
AND (NOT ia.attisdropped))
AND (c.reltype = t.oid)) ELSE _ END) AS attrnames
FROM pg_catalog.pg_type AS t INNER
JOIN pg_catalog.pg_namespace AS ns
ON (ns.oid = t.typnamespace) LEFT
JOIN pg_type AS elem_t
ON (((t.typlen = _)
AND (t.typelem != _))
AND (t.typelem = elem_t.oid)) LEFT
JOIN pg_range AS range_t
ON (t.oid = range_t.rngtypid)) AS ti, typeinfo_tree AS tt
WHERE ((((tt.elemtype IS NOT NULL)
AND (ti.oid = tt.elemtype)) OR ((tt.attrtypoids IS NOT NULL)
AND (ti.oid = ANY (tt.attrtypoids)))) OR ((tt.range_subtype IS NOT NULL)
AND (ti.oid = tt.range_subtype))) OR ((tt.basetype IS NOT NULL)
AND (ti.oid = tt.basetype))) SELECT DISTINCT *, basetype::REGTYPE::STRING AS basetype_name, elemtype::REGTYPE::STRING AS elemtype_name,
range_subtype::REGTYPE::STRING AS range_subtype_name
FROM typeinfo_tree
ORDER BY depth DESC
Performing a query without piccolo such as:
import psycopg2
connection = psycopg2.connect(**con)
cursor = connection.cursor()
cursor.execute("select * from post")
works fine/speed as intended.
please send help if I am doing anything wrong
Another interesting note:
- swapping
Array(UUID())toArray(Varchar())would still run the complex query - swapping
Array(?)toJSONBwould still run the complex query
@Jacky56 You are doing everything right on the table definition and querying Piccolo. I think the problem is how asynpg works with CockroachDB. There have been several issues online based on this. I have Cockroach v22 on my local machine and I can't even insert data with arrays because the error is unable to decorate subquery (which was resolved in v23). Then I upgrade to the latest version available (v24.1.0) and I get the same as you, a very slow query with a complex query in the cluster log. I don't think we can do anything in Piccolo, Based on this discussion the type introspection is done once and I can confirm that if we use a connection pool only the first query is slow and every other query is fine. psycopg doesn't have those problems but Piccolo is based on asyncpg
@sinisaos Thanks for looking into this. Hopefully asyncpg will solve this problem in a future version. It's a good reason for us to consider supporting multiple Postgres client libraries. I'm going to try upgrading our CI to use Cockroach v24.
hello @sinisaos @dantownsend
Is there a way to change psycopg to another sql driver or prerun introspection queries on boot?
im running piccolo on a container and cold starting the service takes criminially long because of what sinisa pointed out
@Jacky56 Psycopg has different connection, pooling and cursor handling than asyncpg and building a Psycopg engine for Piccolo (we can't easily subclass PostgresEngine) a would not be an easy task. You can try the PSQLPy engine for Piccolo which is in early development and not all columns are covered.
I tried a combination of PSQLPy and Cockroach and it works. You can try something like this.
import uuid
from piccolo.table import Table
from piccolo.columns import Array, Text, ForeignKey, UUID, Varchar
from psqlpy_piccolo import PSQLPyEngine
DB = PSQLPyEngine(
config={
"host": "localhost",
"database": "piccolo",
"user": "root",
"password": "",
"port": 26257,
}
)
def generate_uuid():
return str(uuid.uuid4())
class BaseUser(Table, db=DB):
uuid = UUID(primary_key=True, default=generate_uuid)
username = Varchar()
class Post(Table, db=DB):
uuid = UUID(primary_key=True, default=generate_uuid)
post_user = ForeignKey(BaseUser, null=False)
description = Text()
users_mentioned = Array(base_column=UUID(default=generate_uuid))
if __name__ == "__main__":
results = Post.objects().run_sync()
print(results)
related_results = Post.select(Post.all_columns()).run_sync()
print(related_results)
Execution is instant. I hope that makes sense and you can use it in your case.
Hello @sinisaos
I have checked out PSQLlPy and psqlpy-piccolo wrapper, it looks promising but in its very infant stage of development: e.g major issues datatype comparisons
I have found a workaround for now using https://github.com/MagicStack/asyncpg/issues/413#issue-418853439 https://github.com/MagicStack/asyncpg/discussions/1015#discussioncomment-9646970 which effectively 'hard codes' the unknown datatypes so introspection does not occur
from piccolo.engine.cockroach import CockroachEngine
config = {} # some config
DB = CockroachEngine(
**config
)
# a method that gathers info of the datatype and adds it to the codec
async def set_type_codec(conn, typenames):
schema='pg_catalog'
format='text'
conn._check_open()
for typename in typenames:
typeinfo = await conn.fetchrow(asyncpg.introspection.TYPE_BY_NAME, typename, schema)
# print("typeinfo:", typeinfo)
if not typeinfo:
raise ValueError('unknown type: {}.{}'.format(schema, typename))
oid = typeinfo['oid']
conn._protocol.get_settings().add_python_codec(
oid, typename, schema, [], 'scalar',
lambda a: a, lambda a: a, format)
async def main():
conn = await DB.get_new_connection()
await set_type_codec(conn, ["jsonb", "_tsvector", ..., "float8"])
Now the issue lies in conn._protocol.get_settings().add_python_codec, because i dont know what the parameters are suppose to be for most cases.
This does solve the cold start issues for my containers.
@Jacky56 That's very interesting - thanks for sharing. I was unaware you could do things like that with asyncpg.
@Jacky56 Glad you found a solution for your use case. I tried that but in my case (on local machine) even with your workaround introspection it takes a long time. But that doesn't matter, what matters is that everything works fine for you. There are no such problems with pysqlpy, but the main problem with that engine (as I already wrote) is that not all columns are covered and there is not much activity in the repo. Cheers.