zend-db
zend-db copied to clipboard
PostgresqlMetadata works very-very slowly
in the class Zend\Db\Metadata\Source\PostgresqlMetadata method loadTableNameData works very slowly. It generate query:
SELECT "t"."table_name", "t"."table_type", "v"."view_definition", "v"."check_option", "v"."is_updatable"
FROM "information_schema"."tables" t
LEFT JOIN "information_schema"."views" v ON "t"."table_schema" = "v"."table_schema" AND "t"."table_name" = "v"."table_name"
WHERE "t"."table_type" IN ('BASE TABLE', 'VIEW') AND "t"."table_schema" = 'public'
query result: 438 rows 2160 ms! Why make a selection from the database all the tables when we need all the data on a single table? Perhaps you need to cache?
Mine took 5.8 seconds on 239 tables :D (used jira database as test)
Indeed, this is a problem if used to find details for a single table. But for that, general getTables() is not justified. If you use
$table = new TableGateway('tableName', new MetadataFeature());
Then resultant queries have WHERE "t"."table_name" = 'tableName' injected which cuts down on execution time pretty well to 20-40ms (although still has large execution plan). Not ideal if you want to make all table gateways dynamic.
Or do you want to use this for a different purpose?
I would be very reluctant to add caching at framework level for this. Some uses, such as db migrations or admin tools in general need to have most up to date database info. It should be up to me in my application logic to dictate what is safe to assume will not change. Improved API that narrow's down query size would be better.
So I will ask if the forthmentioned usage through table gateway, which minimized the query down to a specific table and allows you to get info about it from table gateway object, is enough for you. Or if you need to get metadata specifically through $meta = new Metadata($adapter); $meta->getTable('tablename'); can think about improving current API to let user ask same questions as table gateway can (which are currently hidden in protected methods).
Because in my project does not use db view I solved the problem simply (custom PostgresqlMetadata):
class PostgresqlMetadata extends \Zend\Db\Metadata\Source\PostgresqlMetadata
{
/**
* {@inheritdoc}
*/
public function getTable($tableName, $schema = null)
{
$table = new \Zend\Db\Metadata\Object\TableObject($tableName);
$table->setColumns($this->getColumns($tableName, $schema));
$table->setConstraints($this->getConstraints($tableName, $schema));
return $table;
}
}
That is a clever solution to one off usage like this (with no tablegateway) . Thank you for sharing!
PostgresqlMetadata#loadConstraintData generate query:
SELECT "t"."table_name", "tc"."constraint_name", "tc"."constraint_type", "kcu"."column_name", "cc"."check_clause", "rc"."match_option", "rc"."update_rule", "rc"."delete_rule", "kcu2"."table_schema" "referenced_table_schema", "kcu2"."table_name" "referenced_table_name", "kcu2"."column_name" "referenced_column_name"
FROM "information_schema"."tables" t
INNER JOIN "information_schema"."table_constraints" tc ON "t"."table_schema" = "tc"."table_schema" AND "t"."table_name" = "tc"."table_name"
LEFT JOIN "information_schema"."key_column_usage" kcu ON "tc"."table_schema" = "kcu"."table_schema" AND "tc"."table_name" = "kcu"."table_name" AND "tc"."constraint_name" = "kcu"."constraint_name"
LEFT JOIN "information_schema"."check_constraints" cc ON "tc"."constraint_schema" = "cc"."constraint_schema" AND "tc"."constraint_name" = "cc"."constraint_name"
LEFT JOIN "information_schema"."referential_constraints" rc ON "tc"."constraint_schema" = "rc"."constraint_schema" AND "tc"."constraint_name" = "rc"."constraint_name" LEFT JOIN "information_schema"."key_column_usage" kcu2 ON "rc"."unique_constraint_schema" = "kcu2"."constraint_schema" AND "rc"."unique_constraint_name" = "kcu2"."constraint_name" AND "kcu"."position_in_unique_constraint" = "kcu2"."ordinal_position"
WHERE "t"."table_name" = 'users' AND "t"."table_type" IN ('BASE TABLE', 'VIEW') AND "t"."table_schema" = 'public'
ORDER BY CASE "tc"."constraint_type" WHEN 'PRIMARY KEY'
THEN 1 WHEN 'UNIQUE' THEN 2 WHEN 'FOREIGN KEY' THEN 3 WHEN 'CHECK'
THEN 4 ELSE 5 END, "tc"."constraint_name", "kcu"."ordinal_position"
It works very slowly: 43 rows1656 ms Please fix it
This problem is starting to affect me too in my migrations project having these queries run on every page request. With some research turns out information_schema implementation for sake of compatibility with ANSI SQL has too many internal joins causing massive execution plan even on smallest of databases. Its basically a view over actual catalogue tables. Since Zend DB has a platform specific override for metadata, usage of this very slow abstraction may not be justified. Abstraction on top of abstraction. However, if we query against catalogue directly there is a possibility postgresql may change them between versions and we have to chase those. I do not know how much, if at all, it changes.
Anyway, for now, can I please ask you for a favor and instead of first query you posted run
SELECT
c.relname as table_name,
CASE
WHEN c.oid = pg_my_temp_schema() THEN 'LOCAL TEMPORARY'
WHEN c.relkind = 'r' THEN 'BASE TABLE'
WHEN c.relkind = 'v' THEN 'VIEW'
WHEN c.relkind = 'f' THEN 'FOREIGN TABLE'
END AS table_kind,
CASE
WHEN c.relkind = 'v' THEN (select pg_get_viewdef(c.relname, true))
END as view_definition
FROM pg_class c
JOIN pg_namespace n
ON n.oid = c.relnamespace
WHERE n.nspname = 'public'
AND (c.relkind = 'r' OR c.relkind = 'v')
And see (1) you do not lose any data and (2) it executes fast enough to be considered as a viable replacement?
(This does not yet include last 2 columns used by ZF Metadata, check_option, and is_updatable, I really want to see if this would help at all before researching what those are ).
That query has less joins and scans to go through without going through ANSI abstraction (screenshots included for comparison).

If this is worth it and maintainers believe doing own implementation of queries without ANSI is safe for us to do, I could consider to try converting other monsters too.
P.S. Maybe also consider lazy loading some of table properties, like constraints and columns. I feel full metadata detail queried at once will still cause huge execution plans, even if avoid information_schema. In my current usecase I just need table name, yet I have column and constraints queried too. Maybe those class properties can be replaced with lazy loading proxies instead of populated ahead of time. Then if anyone does hit those getters, multiple SQL quries will still be more performant than those huge things.
Can also rework queries to be more specific to framework's purpose. For example, ZF does not actually need a large CASE statement for table type, because Metadata just does
switch ($data['table_type']) {
case 'BASE TABLE':
$table = new Object\TableObject($tableName);
break;
case 'VIEW':
$table = new Object\ViewObject($tableName);
break;
default:
throw new \Exception('Table "' . $tableName . '" is of an unsupported type "' . $data['table_type'] . '"');
}
which might as well be 'r', 'v', respectively.
Anyway, looking forward to your result, if you can.
OO OO this one is even faster!
SELECT
c.relname as table_name,
c.relkind as table_type
FROM pg_class c
WHERE c.relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'public')
AND (c.relkind = 'r' OR c.relkind = 'v')

Without a join, there are only 38 rows to work with. Annoyingly, fields needed from pg_* tables are not indexed.
Wondering why performance on DDL is relevant here: never had performance-specific questions in doctrine's schema tools, and the overhead is similar...
On 4 Jan 2017 1:54 a.m., "Sasha Alex Romanenko" [email protected] wrote:
OO OO this one is even faster!
SELECT c.relname as table_name, c.relkind as table_type FROM pg_class c WHERE c.relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'public') AND (c.relkind = 'r' OR c.relkind = 'v')
[image: table names plan comparison subquery] https://cloud.githubusercontent.com/assets/3820828/21628513/1a72323e-d1ee-11e6-97af-f72ca7f21191.png
Without a join, there are only 38 rows to work with. Annoyingly, fields needed from pg_* tables are not indexed.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zendframework/zend-db/issues/199#issuecomment-270268464, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJakIAewVNk40maZmkekRUwQrF9COi7ks5rOu3ggaJpZM4LGaYi .
That's why I dismissed it at first too. DDL is either one time thing or a database admin utility (if someone is writing a custom one for whatever reason), both of which usecases have no need to be that performant. I've seen metadata used as part of page load to check whether database version is "safe" to use against what application expects, but that too was mostly for convenience during dev and is trivial to do manually.
What got me worried was other features that use metadata without users realizing inadvertently running extra queries. MetadataFeature for TableGateway, example I can think of right now, allows to populate table gateway class instance with details for operating it, such as what column is primary key, is auto_incremented so inserts/updates do not cause conflicts. So if instead of manually describing to table gateway the structure of the table in constructor as one typically should, they just throw new MetadataFeature() as the framework encourages to do, given the existence of this thing, and if this is done for every table in application, can get explosive. I wouldn't do it, but I can see others do it. I too am curious about original poster's usecase, just how important is this?
Sorry, never mind about auto_increment thing. I mixed it up with sequence feature that does that. I do not THINK table gateway does that on its own, in which case purpose of automatically populating its column fields seems less important, but it still gives option to do so for whatever reason.
While I agree that a more through analysis of query performance should be done, the query you provided worked perfectly for me.
I opened #321 so others could benefit from the performance improvement.
This repository has been closed and moved to laminas/laminas-db; a new issue has been opened at https://github.com/laminas/laminas-db/issues/86.