Provide improved support for loading types
I am using pgx with a database which defines a very large number of custom types. Virtually every stored procedure returns a custom type, and any given application needs at least 50-100 types, often more.
The current LoadType connection method is helpful, but is limited:
LoadTypegenerates two queries to the database when one would be sufficient.- it can only be given one type at a time, and then requiring at least one round-trip to the database to complete. Coupled with the impracticality of running multiple
LoadTypecalls concurrently, this adds significant overhead to an application's startup, before the dataclass connection/pool can be used at all. - it is painful to have to manually calculate the types which will be required in the application. It is not only necessary to specify the desired type, but to recursively calculate all the other types these depend on, and to load them in, in the correct order.
I would like to create a PR to resolve these issues, either in separate PRs or together, if I get the green light. Are these seen as valid concerns, and would improvements in these areas be accepted?
To briefly describe what I am doing locally, and which would form the basis of my PR(s):
- (the first point above is quite trivial: the first
QueryRowcalls made inLoadTypeis used in the second query) - add a new method named
LoadTypeswhich takes a[]stringof type names. Modify the (now) single SQL query to use= ANY($1)to find all OIDs matching the provided type names, and return[]*pgtype.Type. - add a helper method
GetTypeDependencieswhich, given a connection and a list of type names, recursively identifies the sub-types they implicitly also need registered. This can either be called internally byLoadTypeand the newLoadTypesor if that is considered too "magicial", can be kept separate for the user to invoke. e.g.conn.LoadTypes(ctx, conn.GetTypeDependencies(ctx, "my_type1", "my_type_2", ...))
The SQL I am using to assist in the above:
WITH RECURSIVE typeinfo AS (
SELECT c.relname AS parent, regexp_replace(pg_catalog.format_type(a.atttypid, a.atttypmod), '\[\]$', '') as child, 'f'::bool AS has_array
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
JOIN pg_catalog.pg_attribute a ON (a.attrelid = c.oid)
JOIN pg_type ON (pg_type.typname = regexp_replace(pg_catalog.format_type(a.atttypid, a.atttypmod), '\[\]$', ''))
WHERE pg_catalog.pg_table_is_visible(c.oid)
AND a.attnum > 0 AND NOT a.attisdropped
UNION ALL
SELECT typname AS parent, regexp_replace(pg_catalog.format_type(typbasetype, typtypmod) , '\[\]$', '') AS child, (pg_type.typarray > 0)::bool AS has_array
FROM pg_type
),
relationships(parent, child, has_array, depth) AS (
SELECT typeinfo.parent, typeinfo.child, typeinfo.has_array, 0
FROM typeinfo
WHERE parent = ANY($1)
UNION ALL
SELECT typeinfo.parent, typeinfo.child, typeinfo.has_array, relationships.depth + 1
FROM typeinfo
JOIN relationships ON (typeinfo.parent = relationships.child)
)
SELECT child, has_array
FROM relationships
INNER JOIN pg_type ON (pg_type.typname = relationships.child)
WHERE typtype != 'b'
GROUP BY child, has_array
ORDER BY max(depth) DESC, child;
If this proposal succeeds, a follow-up proposal would involve providing a simpler means for users to do this with pgxpool, instead of having to write their own code to manage the collection of these types, storing them, and then applying them each time a new connection was made.
An additional possibile minor extension to this proposal is to also define LoadAllTypes which does exactly that. This would be useful when there are only a relatively small number of types (as it won't bitrot as the types used by the database evolve), or where the overheads of loading in all types is outweighed by the benefits. Indeed, if only a single SQL call is needed regardless of how many types are loaded, the overheads are reduced significantly.
I definitely like optimizing LoadType to only be a single query.
I'm tentatively in favor of a LoadTypes that also loads dependencies.
I'm tentatively in favor of LoadAllTypes. A long time ago, back in pgx v2 all types were automatically loaded. That led to problems for people who had hundreds of thousands of types (https://github.com/jackc/pgx/issues/140). But if it is opt-in that should resolve the issue.
LoadTypes and LoadAllTypes would need to be smart enough to return types in dependency order so they could be registered successfully.
Also, some sort of caching like you suggested in https://github.com/jackc/pgtype/issues/216 could be really beneficial when there is very large numbers of types, but I am nervous about as there are some nasty edge cases, especially using a HA setup with logical replication -- the OIDs may be different from one server to the next. I'd rather make it so mast that caching is unnecessary.
Ok, thanks for the confirmation that I'm headed in the right direction.
I'll put together some PRs in the coming days to address these.