cms icon indicating copy to clipboard operation
cms copied to clipboard

Improve validate unique uri's in CollectionTreeController

Open juliawarnke opened this issue 1 year ago • 7 comments

Bug description

I have a project with almost two thousand entries in my pages collection and an associated collection tree wich is eleven levels deep. When the CollectionTree is saved the validation process of checking for duplicate uri's slows down the request massively. Even a max execution time of 240 seconds is sometimes not enough to prevent the control panel from crashing.

How to reproduce

  1. Create a pages collection
  2. Activate the collection tree
  3. Create a lot of entries. I used Entry::make() for it.
  4. Enter the CollectionTree and move the entries around with multiple levels.
  5. Save the CollectionTree
  6. Repeat step 4. and 5.

Logs

No response

Environment

Environment
Laravel Version: 10.33.0
PHP Version: 8.2.13
Composer Version: 2.6.2
Environment: production
Debug Mode: OFF
Maintenance Mode: OFF

Cache
Config: NOT CACHED
Events: NOT CACHED
Routes: NOT CACHED
Views: CACHED

Drivers
Broadcasting: null
Cache: redis
Database: mysql
Logs: stack / daily, slack
Mail: smtp
Queue: redis
Session: file

Statamic
Addons: 7
Antlers: runtime
Stache Watcher: Disabled
Static Caching: half
Version: 4.36.0 PRO

Statamic Addons
elvenstar/statamic-meilisearch: 2.0.1
visuellverstehen/statamic-anchor-navigation: 0.3.6-beta
visuellverstehen/statamic-classify: 2.6.1
visuellverstehen/statamic-content-renderer: 0.1.1-beta
visuellverstehen/statamic-glossary-tooltips: dev-main
visuellverstehen/statamic-picturesque: 1.3.0
visuellverstehen/statamic-pixxio-flysystem: dev-main

Installation

Fresh statamic/statamic site via CLI

Antlers Parser

None

Additional details

We did a little digging and discovered that the validateUniqueUri method probably does a lot of unnecessary work in order to compare all uri's for uniqueness. For every entry a pages object is created which holds the entire collection tree and a lot more in order to compare parents and child entries afterwards. It seems like a lot for this purpose.

Besides putting Statamic\Entries\UpdateStructuredEntryUris and Statamic\Entries\UpdateStructuredEntryOrder in the queue in order to increase the performance, we found another quick fix that worked pretty well for us. We resolved the uri for each entry by accessing a new search index that holds all slugs. We then search for duplicate uri's.

We've replaced the CollectionTreeController with our own and overwritten the validateUniqueUri method with the following:

private function validateUniqueUris($tree)
{
    if (! $tree->collection()->route($tree->locale())) {
        return;
    }

    if (! $index = Search::index("pages_slugs")?->raw()) {
        // whatever needs to be done.
    }
        
    $this->pageSlugsIndex = json_decode($index, true);
    $pathsArray = $this->populatePathsArray($tree->tree());
    $uniquePathsArray = array_unique($pathsArray);
    $duplicatePaths = array_diff_key($pathsArray, $uniquePathsArray);
    
    if (! empty($duplicatePaths)) {
        $uri = preg_replace('/^\//', '', array_shift($duplicatePaths));
        throw ValidationException::withMessages(['uri' => trans('statamic::validation.duplicate_uri', ['value' => $uri])]);
    }
}

private function populatePathsArray(array $entries, $path = '')
{
    $paths = [];
    foreach ($entries as $entry) {
        $slug = $this->pageSlugsIndex["entry::{$entry['entry']}"]['slug'] ?? null;
        // Unpublished entries are not part of the index, and as they are nothing that is linked to, we can ignore them.
        if (! $slug) {
            continue;
        }

        if ($path === '') {
            $paths[] = $slug;
        } else {
            $paths[] = $path . '/' . $slug;
        }

        if (isset($entry['children'])) {
            $paths = array_merge($paths, $this->populatePathsArray($entry['children'], $path . '/' . $slug));
        }
    }

    return $paths;
}

Of course this code only works for our needs. But maybe this could be a starting point to improve this part of the core.

juliawarnke avatar Dec 21 '23 10:12 juliawarnke

Can you try updating to 4.40.0 or higher? There were some improvements to saving collection trees in that release.

ryanmitchell avatar Jan 02 '24 09:01 ryanmitchell

This should have been fixed by #8800 (which was released in v4.40.0).

I remember reproducing the performance issues by following similar steps. Then, when I brought in that PR, the performance was vastly improved.

I'm going to close this issue. If you're still experiencing issues after updating, please leave a comment and we can re-open.

duncanmcclean avatar Jan 03 '24 11:01 duncanmcclean

@duncanmcclean thanks for looking into it. I just tested this scenario with v4.40.0 and and yes sometimes saving the tree takes only a few seconds in my project. But if I move a branch with lots of childpages around or place it somewhere deep into another branch I run into timeouts again. It takes longer than a minute to save the tree.

image

Would appreciate it if you could look into the validation of the uri's.

juliawarnke avatar Jan 03 '24 12:01 juliawarnke

Okay, I've reopened this issue so we can continue to look into it.

duncanmcclean avatar Jan 03 '24 13:01 duncanmcclean

@duncanmcclean

I've also found that the updateEntryUri and updateEntryOrder methods in the CollectionsStore class always rebuild the complete index instead of only updating the affected entries. Both methods accept ids as second param, but they are never used.

Statamic\Stache\StoresCollectionStore.php

public function updateEntryUris($collection, $ids = null)
    {
        Stache::store('entries')
            ->store($collection->handle())
            ->index('uri')
            ->update();
    }

Is there a reason why the existing method updateItem in the Index class is not used? With the following code I've managed to increase the performance significantly.

public function updateEntryUris($collection, $ids = null)
    {
        $index = Stache::store('entries')
            ->store($this->collection)
            ->index('uri');

        if (empty($ids)) {
            $index->update();

            return;
        }

        foreach ($ids as $id) {
            if (!$entry = Entry::find($id)) {
                continue;
            }

            $index->updateItem($entry);
        }
    }

juliawarnke avatar Jan 18 '24 12:01 juliawarnke

Thanks for your PR last week! Did it resolve this issue or is this still an issue for you?

duncanmcclean avatar Jan 24 '24 11:01 duncanmcclean

@duncanmcclean validation of the uri's is still very slow. So it would be great if you would keep this issue on your agenda.

juliawarnke avatar Jan 25 '24 06:01 juliawarnke