craft-blitz icon indicating copy to clipboard operation
craft-blitz copied to clipboard

Exponential cache growth consuming disk space

Open nevinsm opened this issue 4 months ago • 7 comments

Bug Report

What is happening

  • Cache files are continually created and never cleaned up in REFRESH_MODE_EXPIRE and REFRESH_MODE_EXPIRE_AND_GENERATE
  • Cache tables are continually populated and never cleaned up in REFRESH_MODE_CLEAR and REFRESH_MODE_CLEAR_AND_GENERATE

What was expected

  • cache refreshing would purge outdated items from the database and filesystem that are no longer needed regardless of refresh mode

Steps to reproduce

The easiest way to see it happen is to set refresh mode to REFRESH_MODE_EXPIRE and then refresh the cache, you will end up with more cache files and the old ones won't be cleaned up when the new ones are generated

This is one of those interesting things that only became an issue on a site with a lot of complex content, a lot of content editing, but not a lot of new features and regular deploys where our deploy would normally purge the cache and clear most things out.

The issue with clear mode is a bit more insidious in that the cache tables can populate to very large sizes which then can crash out due to foreach loops iterating over massive element query results to create arrays.

Hope that the bug report makes sense.

Diagnostics Report

  • Blitz: 5.11.5
  • CraftCMS: 5.8.11
  • PHP: 8.4

nevinsm avatar Sep 10 '25 22:09 nevinsm

The REFRESH_MODE_EXPIRE refresh mode translates to Expire the cache, regenerate manually or organically. As per the docs, cached pages are “regenerated manually (via a cron job) or organically (when pages are visited, see the note below)”. Do you have a cron job in place that executes the blitz/cache/refresh-expired console command at regular intervals? If not, adding one that runs daily will ensure the cache is regenerated and cleaned up at least once per day.

Out of curiosity, how many cached pages are we talking about, and how much disk space is it using?

bencroker avatar Sep 11 '25 06:09 bencroker

On this site it is 58 cached pages and we ended up with 34gigs of cache files within just a month between changing from REFRESH_MODE_CLEAR to REFRESH_MODE_EXPIRE, but we have one page which does a query string combinatorial thing to render a dynamic list of resources depending on the filters selected that generates a lot of cache items. We did try running that manually but it didn't end up clearing anything, and we do have a cron job running but the queue jobs it kicks off usually crash out if we leave it in REFRESH_MODE_CLEAR.

As part of diagnosing the issue I ended up chasing the code paths and reading the plugin code and that is where I noticed that expire and clear end up being mutually exclusive actions due to most calls to the ClearCache service I could track through the path that command follows being wrapped in a Blitz::$plugin->settings->shouldClearOnRefresh($forceClear) and the forceRefresh method in the refresh service only updating the forceGenerate property.

We originally switched to expire because when were in clear mode we ended up with millions of rows in the blitz cache tables due to frequent updates of the content that makes up that combinatorial page and the opposite problem happening with the ExpireCache calls being wrapped in a Blitz::$plugin->settings->shouldExpireOnRefresh($forceClear, $forceGenerate) call, but the settings model when $forceGenerate is set to true returns false so the cron job never cleared out the cache tables when the queue jobs kicked off by that command calling the refresh method of the RefreshCache service tried to run.

Ultimately the cron jobs eventually start crashing once the table sizes grow large enough because of the foreach ($refreshData->getElementTypes() as $elementType) { loop.

Let me know if I missed something in all of that where there is an escape hatch. The plugin is a complicated beast and does an impressively large amount of things so there is a solid chance I could have missed something, but it looks to my eye like any of the should{x} methods in the settings model need their signatures updated to have a single $force param that causes the method to return true when set. Then whenever something has a method like forceRefresh the class property should be passed to all of the should checks instead of just some of them and you would have a consistent escape hatch to ensure that the cache data is purged when you intend it to be.

nevinsm avatar Sep 11 '25 15:09 nevinsm

I noticed you're multiple versions behind the latest. Can you please update and clear and refresh the cache to ensure that whatever issue you're coming up against hasn't already been addressed?

If that doesn't help, can you explain the “query string combinatorial thing”? If it is dynamic and Blitz is not aware of it then that might help explain the issue.

bencroker avatar Sep 13 '25 13:09 bencroker

This page combines two types of filters into the URL parameters:

  • https://loveyourmindtoday.org/es/cosas-que-puedes-probar

The blitz config is set up with:

        'queryStringCaching' => SettingsModel::QUERY_STRINGS_CACHE_URLS_AS_UNIQUE_PAGES,

        // Things To Try and Search
        'includedQueryStringParams' => [
            [
                'enabled' => true,
                'siteId' => '', // All sites
                'queryStringParam' => 'topic',
            ],
            [
                'enabled' => true,
                'siteId' => '', // All sites
                'queryStringParam' => 'format',
            ],
            [
                'enabled' => true,
                'siteId' => '', // All sites
                'queryStringParam' => 'group',
            ],
            [
                'enabled' => true,
                'siteId' => '', // All sites
                'queryStringParam' => 'activity',
            ],
            [
                'enabled' => true,
                'siteId' => '', // All sites
                'queryStringParam' => 'search',
            ],
        ],

With REFRESH_MODE_EXPIRE it appears that no static text cache files are/were ever actually deleted. Nevin tried following the logic in the plugin source code from what blitz/cache/refresh-expired does and we couldn't find anywhere that purges those files.

maxfenton avatar Sep 30 '25 13:09 maxfenton

My guess is that unique query strings are causing large amounts of (valid) cached pages to be generated. Can you confirm whether this is indeed the case using the Blitz Diagnostics utility?

Refreshing the cache results in cached pages being regenerated. It sounds like perhaps you want a full cache clear and flush, to clear out old (but still valid) cached pages? I’m probably still unclear as to what your specific question is.

bencroker avatar Sep 30 '25 13:09 bencroker

Questions:

  • do any of these modes ever delete static page cache files that are no longer tracked / valid or does that only happen by clearing all cache?
  • do any of these modes ever remove records from the database that are no longer tracked / valid or does that only happen by flushing all?

maxfenton avatar Sep 30 '25 14:09 maxfenton

do any of these modes ever delete static page cache files that are no longer tracked / valid or does that only happen by clearing all cache?

Yes, when Blitz becomes aware of a cached page that should no longer be cached, say for example when an entry is disabled, the cached page is cleared. However, changing Blitz settings does not result in cached pages being automatically cleared.

do any of these modes ever remove records from the database that are no longer tracked / valid or does that only happen by flushing all?

Yes, when Blitz generates pages, it first removes any tracking related to that page’s URI from the database. As above, changing Blitz settings does not result in tracked pages/elements/queries being automatically removed.

bencroker avatar Sep 30 '25 17:09 bencroker