cms icon indicating copy to clipboard operation
cms copied to clipboard

Slow page loading and asset metadata

Open mikemartin opened this issue 2 years ago • 3 comments

Bug description

I've tried just about everything to improve the slow page loading on this site (disabling the watcher, selecting nav fields, etc) and I seem to have narrowed the problem down to the high number of assets. The assets folder on this project has exactly 14464 asset files and as a result, some of my pages take as long as 8 seconds to load. When I turn off catch_meta the page takes 40 seconds to load.

Typically this isn't a problem with static caching but I've recently introduced Livewire on this project and every Livewire query is taking 2 to 3 seconds.

I understand that I should reduce the number of assets but this still seems like a major performance issue. Would it make any difference if I moved the assets to external storage like DO spaces or S3?

How to reproduce

You can test the Livewire page filters here as a demonstration of the slow requests: https://staging.burlingameproperties.com/properties

Logs

No response

Environment

aerni/livewire-forms: 4.0.0
jonassiewertsen/statamic-livewire: 2.9.0
mikemartin/helpscout-beacon: 1.0.2
spatie/statamic-responsive-images: 2.13.0
swiftmade/statamic-clear-assets: 1.1.0
webographen/statamic-widget-continue-editing: 1.0.1
withcandour/aardvark-seo: 2.0.28

Installation

Fresh statamic/statamic site via CLI

Antlers Parser

runtime (new)

Additional details

No response

mikemartin avatar Aug 15 '22 05:08 mikemartin

Would it make any difference if I moved the assets to external storage like DO spaces or S3?

No, it'd likely be worse.

jasonvarga avatar Aug 15 '22 13:08 jasonvarga

I've been looking into some slowness on a site I've been working on and feel like it may be a similar issue. I'm only dealing with a few thousand assets, not tens of thousands, but the number of assets is the only thing I can attribute the issue to.

This could be nothing, but I've been doing some (very unscientific) testing using the Starters Creek kit and have spotted something that may be of interest:

  1. I created a new site
  2. I tweaked the blog blueprint to have five additional asset fields
  3. I updated one of the entries with those new fields set
  4. I then replaced the blog/show template with one that outputs each of the five assets and records the times

With this set up the output from the template is this (in milliseconds):

Initial: /assets/blinking-carot.gif 2048
4.7008991241455

Secondary: /assets/donut.jpg 2048
1.9650459289551

Secondary: /assets/idea.jpg 2048
1.7940998077393

Secondary: /assets/octopus.jpg 2048
2.2079944610596

Secondary: /assets/pizza-wifi.jpg 2048
2.9721260070801

Average Secondary:
2.2348165512085

So the initial image evaluation takes ~5ms, then subsequent ones take ~2ms. That makes perfect sense. On the first load I guess the index hasn't been loaded yet so that takes a bit longer.

I then filled the assets folder with 10,000 additional files, cleared the cache and warmed the stache. Try again:

Initial: /assets/blinking-carot.gif 2048
40.089845657349

Secondary: /assets/donut.jpg 2048
21.219968795776

Secondary: /assets/idea.jpg 2048
13.118982315063

Secondary: /assets/octopus.jpg 2048
13.496875762939

Secondary: /assets/pizza-wifi.jpg 2048
14.589071273804

Average Secondary:
15.606224536896

Initial image is slower, which still makes sense, larger index is going to take longer to load. But the curious thing is the subsequent images. They're all much slower as well, which I wouldn't really expect.

The problem seems to be something inside the OrderedQueryBuilder that's called in Fieldtypes\Assets::augment(). I've not got into the guts of that to figure out what's going on, but as a quick hack to test the theory I have replaced that method with one that just goes straight to the container rather than the query builder:

    public function augment($values)
    {
        $values = Arr::wrap($values);

        $assets = collect($values)->map(fn ($value) => $this->container()->makeAsset($value));

        return $this->config('max_files') === 1 ? $assets->first() : $assets;
    }

Fetching the assets that way results in much faster times on the subsequent images:

Initial: /assets/blinking-carot.gif 2048
29.941082000732

Secondary: /assets/donut.jpg 2048
4.364013671875

Secondary: /assets/idea.jpg 2048
3.8208961486816

Secondary: /assets/octopus.jpg 2048
4.1608810424805

Secondary: /assets/pizza-wifi.jpg 2048
3.9041042327881

Average Secondary:
4.0624737739563

I know we're only talking milliseconds here, but I can see how this could add up with a container with tens of thousands of images and a page that outputs quite a few.

Demo Repo

Here's my testing repo: https://github.com/jacksleight/statamic-sandbox/tree/asset-testing

It's set up with the small assets collection, and you can view the timings at the /pocket URL. I tested with the stache watcher off.

To test it with 1000s of files run:

php artisan fill
php artisan cache:clear
php please stache:warm

jacksleight avatar Aug 16 '22 13:08 jacksleight

Thanks Jack. It's making more sense now.

Even with the all the caching / stache watcher disabled / etc the asset query is still going to be filtering arrays with tons of data. Even if they're just simple key/value arrays. 14,000 values in an array must be taking a toll.

jasonvarga avatar Aug 16 '22 14:08 jasonvarga

I still have the same issue. One Plattform with about 5k assets and another with over 150k assets. It took about 45 seconds to load the entry edit form on my Plattform with 150k Assets. +- 10 seconds on the other.

crackAT avatar Oct 13 '22 13:10 crackAT

Could you try again after upgrading to 3.3.44?

jasonvarga avatar Oct 14 '22 14:10 jasonvarga

still 40 seconds to load an entry edit page with one asset field. Statamic Version 3.3.45

crackAT avatar Oct 18 '22 08:10 crackAT

I decided to revisit the previous testing I was doing, and may have found a few tweaks that could improve things when you have a large number of assets (or entries).

These are the stats from my latest tests:

Current With Tweaks
10,000 assets First Asset: 38.343 ms
Avg Secondary Asset: 8.931 ms
First Asset: 4.188 ms
Avg Secondary Asset: 1.953 ms
50,000 assets First Asset: 165.285 ms
Avg Secondary Asset: 43.837 ms
First Asset: 14.042 ms
Avg Secondary Asset: 4.803 ms

And these don't just help with assets, the stash changes speed up entries too. In a site with 50,000 entries querying for a single entry by URL (like when a page is loaded) is ~35% faster.

These tweaks need more work and testing, and I'm sure there are some important details I've missed, but here's what I've changed so far:

  • Asset::exists() and Asset::metaExists() methods When an asset is augmented both of these methods get called, and they both fetch a full list of all files from the container before checking if the asset’s in that list. I noticed at least one of these calls was added as an S3 performance improvement in https://github.com/statamic/cms/pull/6822, but unfortunately it seems to make things slower for local files. Only for the first asset, but it can be significant.

  • Stache\Query\Builder::getWhereColumnKeysFromStore() and related methods Every time an index is used in a query this method takes a full copy of the items and then loops over them to prepend the store name. This adds overhead. To avoid this I’ve removed the mapWithKeys calls and have instead updated Stache\Indexes\Index to save the indexed items with the store name already prepended to the key, so no additional processing is needed during queries.

  • Stache\Query\Builder::filterWhere*() methods All of these methods also loop over the full index to find matching items, but in some instances there are faster ways to do it. For example, with equals and in you can do a key intersect to get the matching items. This only works in certain situations as it requires flipping the arrays and that’s not always possible, but for things like ID and path lookups it works well. There might be ways to do similar things in the other filterWhere*() methods but I’ve not looked into those yet.

  • Stache\Query\EntryQueryBuilder::getKeysFromCollectionsWithWhere() method When checking multiple and where conditions the list of potential matching items can be pre-filtered by the previous condition's result, saving looping through the full list again for multiple columns. This works well for page lookups where the url is matched first and then the site is matched. The site column items are reduced from a full list of all entries to one. I’ve not done the same in the other query builders yet.

Here’s the branch I’m using, be interested if anyone else sees an improvement with these changes: https://github.com/jacksleight/statamic-cms/tree/stache-tweaks (comparison, composer patch)

jacksleight avatar Apr 04 '23 11:04 jacksleight

@jacksleight Thanks so much. Will report back soon with our results.

mikemartin avatar Apr 04 '23 12:04 mikemartin

@jacksleight We're seeing at least 20% lesser loading time on each page in our tests.

233266472-a829b2ff-0a6d-4b52-907e-22550f544e01

mikemartin avatar May 04 '23 02:05 mikemartin

Seeing a good improvement with stache-tweaks branch

AntonCooper avatar Jun 21 '23 08:06 AntonCooper

I also have this problem. Over 100,000 files in the storage bucket. Tried everything. Production is completely dependent on a response cache otherwise it won't load pages. Edit: After analysis we realized that 99% of the files were not associated with statamic content, so we split the single bucket into separate buckets which has worked out. And it was massive amount of files too, that took days to move.

yoyoyeahyoyo avatar Nov 01 '23 19:11 yoyoyeahyoyo

It may not suit everyone, but theres a PR now over on eloquent driver for an asset query builder, which should resolve a lot of these problems and defer to the database for performance gains: https://github.com/statamic/eloquent-driver/pull/218

ryanmitchell avatar Nov 27 '23 13:11 ryanmitchell