PageTable fields with `sortfields` causes N+1 SQL queries
Short description of the issue
Bad performance: if a PageTable field has sortfields, the sorting makes an SQL query (or more) for each subpage.
Expected behavior
Fetching a PageTable field does not cause a big number of unnecessary SQL queries.
Actual behavior
For each subpage, the sort algorithm queries each sortfield from on the fly. If a PageTable has 50 subpages and three sort fields, this means 150 extra SQL queries.
Optional: Suggestion for a possible fix
PR coming up.
Steps to reproduce the issue
- Add a template, e.g.
mytpl - To
mytpl, add text fieldmytext - Install the PageTable modules
- Create a PageTable field
ptfield - On
Detailstab, setSort fieldstomytext - Add a PageTable field
ptfieldtobasic-page - Edit a basic page: add a few subpages to
ptfield. Enter something tomytextfor each one. - Run this code:
$page = $wire->pages->get("id=1015"); // replace with the page id
$foo = $page->ptfield;
$queries = $wire->database->getQueryLog();
echo '<p>Number queries: ' . count($queries) . '</p>';
foreach($queries as $query){
echo '<p>' . htmlspecialchars($query) . '</p>';
}
- Observe that for each subpage, the output has a line like:
SELECT field_mytext.data AS `mytext__data` FROM `field_mytext` WHERE field_mytext.pages_id=1021
Setup/Environment
- ProcessWire version: latest dev branch, vanilla installation
- (Optional) PHP version: 8.2
- (Optional) MySQL version: MariaDB 10.5
I'll sometimes aim for more simple 1-index select queries rather than a single query with multiple joins because the multiple simple queries can often consume less time than the single more complex query. Though it depends on the case, and I don't remember for this case specifically. Though the mentioned query is just the sort of type that might perform more quickly as independent queries than in a larger query, though can't say for certain without benchmarking. When it comes to sorting, performance wise it's preferable to sort on a native pages table property when possible, such as name, sort, created, modified, id, etc.
About performance and prefething:
I've done some benchmarking by fetching pages with 10 or even 50 joined fields. Even with low network latencies (docker on local dev machine), using joins seems to perform about three times faster.
$pageIds = [/* list of 100 pages */];
$fields = [/* list of 50 basic text fields */];
foreach($pageIds as $pageId){
$page = $wire->pages->get("id=$pageId", ["loadOptions" => ["joinFields" => $fields]]);
// vs.
// $page = $wire->pages->get("id=$pageId");
foreach($fields as $field){
$foo = $page->get($field);
}
}
I might be missing something in my perf tests, but, with any number of fields, I would be surprised to find a common scenario where the database was slower to perform one SELECT with joins than N+1 simpler SELECT clauses. After all, EXPLAIN looks very good for the join selects that PW generates.
One concern about my PR, though: am I correct that setting joinFields in getById() ignores any autojoin fields that the template might have, thus actually (potentially) increasing the total number of queries, if those fields are queried later?
Thus, I believe the patch should ideally merge sortfields and autojoin fields, if the template has that set.
@tuomassalo I've added some code for this, but since I don't use PageTable much, I'm not sure my contrived PageTable fields are an adequate test of how well it works. I used something similar to what was in your PR but expanded upon it and added the merging for autojoin fields. Do you find it all works well on your installation? Note there was also a small correction added after the commit above.
I believe the patch works as intended. Here's a snippet I tested with. Autojoin fields are also autojoined, as they should.
<?php
include('/var/www/html/index.php');
header('Content-Type: text/plain');
////// Install fixture templates, field and pages.
////// First, clean up previous run (if any).
$wire->database->exec("DELETE FROM pages WHERE templates_id IN (SELECT id FROM templates WHERE name IN ('p', 'c'))");
// Remove templates
foreach(['p','c'] as $t) {
$template = $templates->get($t);
if ($template) $templates->delete($template);
}
$fieldnames = ['sortfield1', 'sortfield2', 'joinfield1', 'joinfield2'];
// Remove fields
foreach(array_merge(['pt'], $fieldnames) as $f) {
$field = $fields->get($f);
if ($field) $fields->delete($field);
}
// Create templates and field.
$pTemplate = $templates->add('p');
$pTemplate->save();
$cTemplate = $templates->add('c');
foreach($fieldnames as $name) {
$f = $fields->makeItem();
$f->type = 'FieldtypeText';
$f->name = $name;
if (strpos($name, 'joinfield') !== false) {
$f->addFlag(\ProcessWire\Field::flagAutojoin);
}
$f->save();
$cTemplate->fields->add($f);
}
$cTemplate->save();
$ptField = $fields->makeItem();
$ptField->name = 'pt';
$ptField->type = 'PageTable';
$ptField->template_id = $cTemplate->id;
$ptField->parent_id = 1;
$ptField->sortfields = 'sortfield1, sortfield2';
$ptField->save();
$d1 = $pTemplate->fields->add($ptField);
$d2 = $pTemplate->save();
// Create pages.
$parent = $pages->add('p', '/', 'parent-a');
for($i=1; $i <= 5; $i++) {
$c1 = $pages->add('c', '/', "child1-$i");
$c1->sortfield1 = "s1-$i";
$c1->sortfield2 = "s2-$i";
$c1->save();
$parent->pt->add($c1);
}
$parent->save();
//////// Now everything is set up.
$pages->uncacheAll();
$wire->database->queryLog(true); // empty log
$result = $wire->pages->get($parent->id);
foreach($result->pt as $p) {
echo $p->id . "\n";
}
echo "\nQueries:\n--------\n\n";
foreach ($wire->database->getQueryLog() as $query) {
echo $query . "\n\n";
}
With the patch, the last query printed is something like
SELECT false AS isLoaded, pages.templates_id AS templates_id, pages.*, pages_sortfields.sortfield,
(SELECT COUNT(*) FROM pages AS children WHERE children.parent_id=pages.id) AS numChildren,field_sortfield1.data AS `sortfield1__data`,field_sortfield2.data AS `sortfield2__data`,field_joinfield1.data AS `joinfield1__data`,field_joinfield2.data AS `joinfield2__data`
FROM `pages`
LEFT JOIN pages_sortfields ON pages_sortfields.pages_id=pages.id
LEFT JOIN field_sortfield1 ON field_sortfield1.pages_id=pages.id
LEFT JOIN field_sortfield2 ON field_sortfield2.pages_id=pages.id
LEFT JOIN field_joinfield1 ON field_joinfield1.pages_id=pages.id
LEFT JOIN field_joinfield2 ON field_joinfield2.pages_id=pages.id
WHERE pages.id IN(8703,8704,8705,8706,8707)
AND pages.templates_id=332
GROUP BY pages.id -- [0.4ms]
Without the patch, the SELECT is different, and five SELECT clauses are printed:
SELECT field_sortfield1.data AS `sortfield1__data`
FROM `field_sortfield1`
WHERE field_sortfield1.pages_id=8713 -- [0ms]