cms icon indicating copy to clipboard operation
cms copied to clipboard

Non-existent urls are statically cached

Open stuartcusackie opened this issue 1 year ago • 6 comments

Bug description

I'm noticing that a lot of bad URLs, such as legacy URLs from WordPress and non-existent image paths, are being statically cached.

For example: https://mysite.ie/app/uploads/2019/04/competitions-at-club-dublin-690x460.jpg https://mysite.ie/media/good_foods.jpg https://mysite.ie/wp-content/uploads/2018/06/Leopardstown_10-2.jpg https://mysite.ie/sitemap.xml.gz https://mysite.ie/swimming/wp-content/dir/erin1.PhP7 https://mysite.ie/index.php/index.php

It's become a small problem recently when I started listening to the UrlInvalidated event to automatically trigger caching as described here: https://github.com/statamic/cms/pull/8902

My site only has about 250 entries but nearly 3500 UrlInvalidated events are caught by my listeners when the static cache is cleared by my static caching rules. It puts a lot of unnecessary strain on the server through queued jobs.

Can non-existent urls somehow be ignored by the static cache? All of the above urls return a 404 error. I assume they are old links from the original site on Google or other indexes.

Thanks.

How to reproduce

Add a listener to handle the UrlInvalidated event, as described here: https://github.com/statamic/cms/pull/8902

Non-existent urls will gather in the static cache over time on a live website.

Logs

No response

Environment

Environment
Laravel Version: 11.25.0
PHP Version: 8.2.18
Composer Version: 2.7.4
Environment: local
Debug Mode: ENABLED
Maintenance Mode: OFF
Timezone: Europe/Dublin
Locale: en

Cache
Config: NOT CACHED
Events: NOT CACHED
Routes: NOT CACHED
Views: CACHED

Drivers
Broadcasting: log
Cache: file
Database: mysql
Logs: single
Mail: smtp
Queue: sync
Session: file

Livewire
Livewire: v3.5.8

Statamic
Addons: 7
Sites: 1
Stache Watcher: Enabled
Static Caching: Disabled
Version: 5.27.0 PRO

Statamic Addons
jonassiewertsen/statamic-live-search: 2.1.1
jonassiewertsen/statamic-livewire: 3.8.0
rias/statamic-redirect: 3.8.1
spatie/statamic-responsive-images: 5.0.1
statamic/seo-pro: 6.1.2
stuartcusackie/statamic-cache-requester: 1.2.1
thoughtco/statamic-cache-tracker: 0.9.2

Installation

Fresh statamic/statamic site via CLI

Additional details

No response

stuartcusackie avatar Sep 30 '24 11:09 stuartcusackie

Are you able to provide the full output of php please support:details?

duncanmcclean avatar Oct 01 '24 10:10 duncanmcclean

Sorry, updated above.

stuartcusackie avatar Oct 03 '24 23:10 stuartcusackie

We should be able to pass along to the UrlInvalidated event whether it was a 404 or not. Then you can avoid refetching those URLs.

jasonvarga avatar Oct 04 '24 13:10 jasonvarga

@jasonvarga That would be perfect. Thanks!

stuartcusackie avatar Oct 04 '24 13:10 stuartcusackie

Actually... I'm just wondering if this would still cause unnecessary processing. The UrlInvalidated event would still be fired thousands of times, and so would my listener, even though it would perform no actions. It seems to me that these urls shouldn't be cached in the first place.

Maybe it's fine. Just a thought.

stuartcusackie avatar Oct 04 '24 13:10 stuartcusackie

They intentionally get cached since #10294.

If your 404 page is heavy - it might be because of a nav or who knows what else - you could easily make a site struggle by hitting different 404 pages.

jasonvarga avatar Oct 04 '24 13:10 jasonvarga

Here's a screenshot to highlight the problem. This is what happens when I change my main navigation for a site with only 250 entries; It slows down my site for over an hour.

image

stuartcusackie avatar Oct 29 '24 11:10 stuartcusackie