incubator-pagespeed-mod
incubator-pagespeed-mod copied to clipboard
Occasional asset 404s
Hi there,
I'm having an issue where websites running Apache mod_pagespeed occasionally break, so requested pagespeed assets return a 404. In the last 2 years, this happened once in a while - a couple of times per year - but this week it seems to happen every day.
CONFIGURATION:
- LAMP server with cPanel, been in service a number of years. No recent significant changes or updates
- Running mod_pagespeed 1.13.35.2-0, has been in place a number of years
- "Vanilla" configuration, pretty much as it came out of the box
- Running approx 80 websites, 20-40 with pagespeed
SYMPTOMS:
- User reports website is "broken"
- I clear pagespeed cache, problem goes away
- I can reload the pagespeed optimised page numerous times, but the problem is gone
- The problem randomly comes back later
DEBUG INFO:
When looking at a broken page:
-
Using Chrome dev tools, I can see some URLs 404, e.g.
https://www.somewebsite.com/default/dist/assets/js/vendor/foundation.min.js,q1573461474+modernizr.js,q1573461475+bowser.min.js,q1573461474+browser-detect.js,q1573461474+jquery.matchHeight-min.js,q1573461474+priority-nav.min.js,q1573461475+slick.min.js,q1573461475+jquery.sticky-kit.min.js,q1573461474.pagespeed.jc.XFH1glDaQL.js https://www.somewebsite.com/default/dist/assets/js/vendor/jquery.magnific-popup.min.js,q1573461474+jquery.form.min.js,q1573461474.pagespeed.jc.LvL0sHBy9W.js https://www.somewebsite.com/default/dist/assets/js/vendor/foundation.min.js,q1573461474+modernizr.js,q1573461475+bowser.min.js,q1573461474+browser-detect.js,q1573461474+jquery.matchHeight-min.js,q1573461474+priority-nav.min.js,q1573461475+slick.min.js,q1573461475+jquery.sticky-kit.min.js,q1573461474.pagespeed.jc.XFH1glDaQL.js https://www.somewebsite.com/default/dist/assets/js/vendor/jquery.magnific-popup.min.js,q1573461474+jquery.form.min.js,q1573461474.pagespeed.jc.LvL0sHBy9W.js
-
In the pagespeed admin I can see corresponding errors:
[Fri, 05 Jun 2020 08:56:09 GMT] [Warning] [21143] Fetch failed for https://www.somewebsite.com/default/dist/assets/js/vendor/jquery.magnific-popup.min.js,q1573461474+jquery.form.min.js,q1573461474.pagespeed.jc.LvL0sHBy9W.js, status=404 [Fri, 05 Jun 2020 08:56:09 GMT] [Warning] [21143] https://www.somewebsite.com/default/dist/assets/js/vendor/jquery.magnific-popup.min.js,q1573461474+jquery.form.min.js,q1573461474.pagespeed.jc.LvL0sHBy9W.js resource_404_count: not found (404)
- If I turn on debug via the URL, there are these kind of debug comments: https://www.somewebsite.com/?PageSpeed=on&PageSpeedFilters=+debug
<script src="https://www.somewebsite.com/default/dist/assets/js/vendor/jquery.magnific-popup.min.js?1573461474"></script>
<!--deadline_exceeded for filter CacheExtender-->
<!--4xx status code, preventing rewriting of https://www.somewebsite.com/default/dist/assets/js/vendor/jquery.magnific-popup.min.js?1573461474-->
<!--4xx status code, preventing rewriting of https://www.somewebsite.com/default/dist/assets/js/vendor/jquery.magnific-popup.min.js?1573461474-->
- But if I go directly to the original assets, there's no 404:
https://www.somewebsite.com/default/dist/assets/js/vendor/jquery.magnific-popup.min.js https://www.somewebsite.com/default/dist/assets/js/vendor/jquery.magnific-popup.min.js?1573461474
- If it helps, this is the debug output at the bottom:
<!-- #Flush after 1690us #Parse duration 1673us #Render duration 22519us #Idle duration 17us --></html><!-- #Flush after 37079us #Parse duration 4us #Render duration 10222us #Idle duration 12866us --><!-- mod_pagespeed on Filters: ah Add Head cw Collapse Whitespace cc Combine Css jc Combine Javascript gp Convert Gif to Png jp Convert Jpeg to Progressive jw Convert Jpeg To Webp mc Convert Meta Tags pj Convert Png to Jpeg ws When converting images to WebP, prefer lossless conversions db Debug ec Cache Extend Css ei Cache Extend Images es Cache Extend Scripts fc Fallback Rewrite Css if Flatten CSS Imports hw Flushes HTML ci Inline Css ii Inline Images il Inline @import to Link ji Inline Javascript js Jpeg Subsampling pr Prioritize Critical Css rj Recompress Jpeg rp Recompress Png rw Recompress Webp ri Resize Images cf Rewrite Css jm Rewrite External Javascript jj Rewrite Inline Javascript cu Rewrite Style Attributes With Url cp Strip Image Color Profiles md Strip Image Meta Data
Options: EnableRewriting (e) 1 FileCacheInodeLimit (afcl) 500000 HttpCacheCompressionLevel (hccl) 0 MaxCombinedJsBytes (xcj) 368640 RewriteLevel (l) Core Filters RunExperiment (fur) False StatisticsLogging (asle) True
#NumFlushes 1 #EndDocument after 37079us #Total Parse duration 1677us #Total Render duration 32741us #Total Idle duration 12883us No critical images detected. The following filters were disabled for this request: CriticalSelectorFilter: No critical selector info in cache SupportNoscript -->
And pagespeed.conf is attached too. pagespeed.conf.txt
I'm stumped on this, as clearing the cache fixes the problem.
Has anyone come across this before?
On a hunch, it looks like the module is not able to fetch its inputs over https when it needs to reconstruct an optimised resource on the fly when it's not in cache when trying to serve a response to a .pagespeed.
url.
See https://www.modpagespeed.com/doc/https_support#https_fetch for options on resolving that
Additionally, can you check in the server logs files any reference to request with User Agent string having "Serf" in it? These request come from pagespeed module.
Thanks chaps, I think @oschaaf your hunch is right, but I haven't found the fix yet.
I can see references to Serf in the access log:
111.222.333.444 - - [05/Jun/2020:12:12:21 +0100] "GET /css/style.css HTTP/1.1" 200 4615 "http://somewebsite/" "Serf/1.3.9 (mod_pagespeed/1.13.35.2-0)"
111.222.333.444 - - [05/Jun/2020:12:12:21 +0100] "GET /favicon.ico HTTP/1.1" 200 113459 "http://somewebsite/" "Serf/1.3.9 (mod_pagespeed/1.13.35.2-0)"
111.222.333.444 - - [05/Jun/2020:12:12:21 +0100] "GET /img/logo.png HTTP/1.1" 200 18187 "http://somewebsite/" "Serf/1.3.9 (mod_pagespeed/1.13.35.2-0)"
111.222.333.444 - - [05/Jun/2020:12:12:21 +0100] "GET /img/facebook.png HTTP/1.1" 200 1933 "http://somewebsite/" "Serf/1.3.9 (mod_pagespeed/1.13.35.2-0)"
111.222.333.444 - - [05/Jun/2020:12:12:21 +0100] "GET /img/blog.png HTTP/1.1" 200 2328 "http://somewebsite/" "Serf/1.3.9 (mod_pagespeed/1.13.35.2-0)"
111.222.333.444 - - [05/Jun/2020:12:12:21 +0100] "GET /img/forum.png HTTP/1.1" 200 3425 "http://somewebsite/" "Serf/1.3.9 (mod_pagespeed/1.13.35.2-0)"
And references to Serf & fetch in server error log:
[Fri Jun 05 11:50:32.915037 2020] [pagespeed:error] [pid 2581] [mod_pagespeed 1.13.35.2-0 @2581] PageSpeed Serf fetch failure rate extremely high; only 4570 of 11001 recent fetches fully successful; is fetching working?
And references to serf & pagespeed errors in server error log:
[Fri Jun 05 12:18:50.254813 2020] [pagespeed:warn] [pid 20601] [mod_pagespeed 1.13.35.2-0 @20601] xtimthumb.php,qsrc=https,P3A,P2F,P2Fwww.somewebsite.co.uk,P2Fassets,P2Fuploads,P2Ffiles,P2Fimages,P2Fbrands,P2Fnew+unitron+logo.jpg,ah=50,aw=,azc=1:0: Resource based on https://www.somewebsite.co.uk/assets/timthumb/timthumb.php?src=https%3A%2F%2Fwww.somewebsite.co.uk%2Fassets%2Fuploads%2Ffiles%2Fimages%2Fbrands%2Fnew+unitron+logo.jpg&h=50&w=&zc=1 but cannot access the original
[Fri Jun 05 12:18:50.254863 2020] [pagespeed:warn] [pid 20601] [mod_pagespeed 1.13.35.2-0 @20601] [0605/121850:WARNING:resource_fetch.cc(195)] Fetch failed for resource url https://www.somewebsite.co.uk/assets/timthumb/xtimthumb.php,qsrc=https,P3A,P2F,P2Fwww.somewebsite.co.uk,P2Fassets,P2Fuploads,P2Ffiles,P2Fimages,P2Fbrands,P2Fnew+unitron+logo.jpg,ah=50,aw=,azc=1.pagespeed.ic.xS-eljNxcx.jpg
[Fri Jun 05 12:18:50.254896 2020] [pagespeed:warn] [pid 20601] [mod_pagespeed 1.13.35.2-0 @20601] Fetch failed for https://www.somewebsite.co.uk/assets/timthumb/xtimthumb.php,qsrc=https,P3A,P2F,P2Fwww.somewebsite.co.uk,P2Fassets,P2Fuploads,P2Ffiles,P2Fimages,P2Fbrands,P2Fnew+unitron+logo.jpg,ah=50,aw=,azc=1.pagespeed.ic.xS-eljNxcx.jpg, status=404
[Fri Jun 05 12:18:50.255021 2020] [pagespeed:warn] [pid 20601] [mod_pagespeed 1.13.35.2-0 @20601] https://www.somewebsite.co.uk/assets/timthumb/xtimthumb.php,qsrc=https,P3A,P2F,P2Fwww.somewebsite.co.uk,P2Fassets,P2Fuploads,P2Ffiles,P2Fimages,P2Fbrands,P2Fnew+unitron+logo.jpg,ah=50,aw=,azc=1.pagespeed.ic.xS-eljNxcx.jpg resource_404_count: not found (404)
[Fri Jun 05 12:18:50.405063 2020] [pagespeed:warn] [pid 21954] [mod_pagespeed 1.13.35.2-0 @21954] xtimthumb.php,qsrc=https,P3A,P2F,P2Fwww.somewebsite.co.uk,P2Fassets,P2Fuploads,P2Ffiles,P2Fimages,P2Fbrands,P2Fsignia-logo.png,ah=50,aw=,azc=1:0: Resource based on https://www.somewebsite.co.uk/assets/timthumb/timthumb.php?src=https%3A%2F%2Fwww.somewebsite.co.uk%2Fassets%2Fuploads%2Ffiles%2Fimages%2Fbrands%2Fsignia-logo.png&h=50&w=&zc=1 but cannot access the original
[Fri Jun 05 12:18:50.405144 2020] [pagespeed:warn] [pid 21954] [mod_pagespeed 1.13.35.2-0 @21954] [0605/121850:WARNING:resource_fetch.cc(195)] Fetch failed for resource url https://www.somewebsite.co.uk/assets/timthumb/xtimthumb.php,qsrc=https,P3A,P2F,P2Fwww.somewebsite.co.uk,P2Fassets,P2Fuploads,P2Ffiles,P2Fimages,P2Fbrands,P2Fsignia-logo.png,ah=50,aw=,azc=1.pagespeed.ic.nrW7iOc8Nk.png
[Fri Jun 05 12:18:50.405187 2020] [pagespeed:warn] [pid 21954] [mod_pagespeed 1.13.35.2-0 @21954] Fetch failed for https://www.somewebsite.co.uk/assets/timthumb/xtimthumb.php,qsrc=https,P3A,P2F,P2Fwww.somewebsite.co.uk,P2Fassets,P2Fuploads,P2Ffiles,P2Fimages,P2Fbrands,P2Fsignia-logo.png,ah=50,aw=,azc=1.pagespeed.ic.nrW7iOc8Nk.png, status=404
[Fri Jun 05 12:18:50.405324 2020] [pagespeed:warn] [pid 21954] [mod_pagespeed 1.13.35.2-0 @21954] https://www.somewebsite.co.uk/assets/timthumb/xtimthumb.php,qsrc=https,P3A,P2F,P2Fwww.somewebsite.co.uk,P2Fassets,P2Fuploads,P2Ffiles,P2Fimages,P2Fbrands,P2Fsignia-logo.png,ah=50,aw=,azc=1.pagespeed.ic.nrW7iOc8Nk.png resource_404_count: not found (404)
[Fri Jun 05 12:18:50.823466 2020] [pagespeed:warn] [pid 21434] [mod_pagespeed 1.13.35.2-0 @21434] xtimthumb.php,qsrc=https,P3A,P2F,P2Fwww.somewebsite.co.uk,P2Fassets,P2Fuploads,P2Ffiles,P2Fimages,P2Fbrands,P2Foticon-new.jpg,ah=50,aw=,azc=1:0: Resource based on https://www.somewebsite.co.uk/assets/timthumb/timthumb.php?src=https%3A%2F%2Fwww.somewebsite.co.uk%2Fassets%2Fuploads%2Ffiles%2Fimages%2Fbrands%2Foticon-new.jpg&h=50&w=&zc=1 but cannot access the original
If we add ModPagespeedFetchHttps enable,allow_self_signed
it doesn't seem to make a difference. We've added RespectXForwardedProto
too, which was suggested elsewhere when searching for a solution.
Here are the options in the debug output: https://i.imgur.com/r3Bxnm6.png
With ModPagespeedFetchHttps enable,allow_self_signed
and RespectXForwardedProto
added, pagespeed is active but failing to process and JS / image assets - not optimised, not combined - so the original error isn't being thrown, but then pagespeed isn't actually working either.
Is there any more information I could share to make this easier to debug?
I've also noticed the following, which might be related?
<!--The preceding resource was not rewritten because its domain (www.somewebsite.com) is not authorized--><!--The preceding resource was not rewritten because its domain (www.somewebsite.com) is not authorized-->
Could they be related?
Internally, on the command line, I can curl requests to the source assets and they connect correctly.
It's worth noting that if I request non-secure source assets, the request is 301'd to https. I'm not sure if that's relevant.
Fecht from pagespeed can´t reach the origin resource. No matter if you can do it via curl or a browser.
To fecht an https resource, pagespeed need to have ModPagespeedFetchHttps enabled, but this need to be done with the apropiate certificates, same as a browser need the CA certificates to evaluate the web server certificates.
This is acomplised by enabling ModPagespeedSslCertDirectory directory and ModPagespeedSslCertFile file as stated in the doc
Read the doc and take account the differences between a RedHat/Centos and Ubuntu/Debian distros.
The request that have the message
<!--The preceding resource was not rewritten because its domain (www.limelight.consulting) is not authorized--><!--The preceding resource was not rewritten because its domain (www.somewebsite.com) is not authorized-->
can´t throw a 404 error but it can´t be optimized by pagespeed.
In these message you have posted 2 domains: www.limelight.consulting and www.somewebsite.com. Are both domans running pagespeed? I assuem that ww.somewebsite.com
is your own domain where pagespeed is running and the other mentioned domain not.
So you need a ModPagespeedDomain http*://*.somewebsite.com
Fecht from pagespeed can´t reach the origin resource. No matter if you can do it via curl or a browser. To fecht an https resource, pagespeed need to have ModPagespeedFetchHttps enabled, but this need to be done with the apropiate certificates, same as a browser need the CA certificates to evaluate the web server certificates.
That was the problem! Checking with a tool like SSLlabs.com, I could see that although the main website certificate was valid, the CA certificate had expired!
It's an issue with Sectigo / Comodo certificates: https://support.sectigo.com/articles/Knowledge/Sectigo-AddTrust-External-CA-Root-Expiring-May-30-2020
Renewing the certificate fixed all the 404s :)
This is acomplised by enabling ModPagespeedSslCertDirectory directory and ModPagespeedSslCertFile file as stated in the doc Read the doc and take account the differences between a RedHat/Centos and Ubuntu/Debian distros.
Oddly, the file referenced in ModPagespeedSslCertFile
doesn't exist (We're on CentOS) but it doesn't seem to cause a problem since I renewed the CA certificate. It doesn't change behaviour if I point it to the files that do exist (ca-bundle.crt, ca-bundle.trust.crt, localhost.crt), so I guess it doesn't matter.
In these message you have posted 2 domains
Ah yes, sorry, bad anonymisation of log messages ;) There was only one domain.
The request that have the message
<!--The preceding resource was not rewritten because its domain (www.somewebsite.com) is not authorized--><!--The preceding resource was not rewritten because its domain (www.somewebsite.com) is not authorized-->
So you need a ModPagespeedDomain http*://*.somewebsite.com
Even now the CA issue has been fixed, I still see those "is not authorized" errors in the debug output whether I set ModPagespeedDomain
correctly or not (it makes no difference) but only for the requests in the <head>
, not the <body>
?
About ModPagespeedSslCertFile or ModPagespeedSslCertDirectory: These are NOT the file or directory where the web certificates are (The certificaates for your own domain). In a centos distro the directory or file, normally is /etc/pki/tls/certs and /etc/pki/tls/cert.pem, and is installed by the ca-certificates rpm package. And yes, for me sound strange that the "not auth" messages show in head and not in body, but not only in head, these images all are icons. Can you try ModPagespeedDomain http*://*.limelight.consulting ?
Thanks for helping with this Lofesa, really, thanks so much :)
In a centos distro the directory or file, normally is /etc/pki/tls/certs and /etc/pki/tls/cert.pem, and is installed by the ca-certificates rpm package.
Found it. Weirdly, the cert.pem is not shown if I do an "LS" on the directory, but if I do a "find" on *.pem files, there it is! So the pagespeed config was correct all along for the cert location.
Can you try ModPagespeedDomain http*://*.limelight.consulting ?
I can try ModPagespeedDomain http*://*.limelight.consulting, but it makes no difference to those assets in the head, even if I'm very explicit with ModPagespeedDomain https://www.limelight.consulting
How odd!
At the end of this page (/pagespeed_global_admin/config) under Domain Lawyer, what do you see? Have you some type of proxy-cache, load balancer.... ?
At the end of this page (/pagespeed_global_admin/config) under Domain Lawyer, what do you see? Have you some type of proxy-cache, load balancer.... ?
Nothing like that, just Invalidation Timestamp: (none)
?
Then or you don´t have set the ModPagespeedDomain or you have set it on the wrong place. Between Domain Lawyer and Invalidation Timestamp you must see any domain that you have authorized with ModPagespeedDomain, ModPagespeedMapOriginDomain, ModPagespeedMapRewriteDomain, ModPagespeedShardDomain or ModPagespeedMapProxyDomain These directives make diff beween http and https, when no protocol is present it relates to http. If you have set ModPagespeedDomain http*://.limelight.consulting you must see http://*.limelight.consulting here