incubator-pagespeed-ngx
incubator-pagespeed-ngx copied to clipboard
LoadFromFileCacheTtlMs does not control how often the file-system is re-checked.
There are several issues actually I want to mention here.
First. One may assume that LoadFromFileCacheTtlMs
allows to have ngx_pagespeed only check files for changes for a configured number of seconds. However this isn't this case as it only controls Cache-Control
and Expires
headers sent out.
ngx_pagespeed seems to check files for changes upon every request which has a potential for improvement. We want to stop those mtime checks which happen on every request, as this would reduce I/O quite a lot on busy sites. Can we have a LoadFromFileCacheStatMs
, please ???
Second, I found LoadFromFileCacheTtlMs
to behave quite strangely as it is (1.13.35.2-0). Mainly in my testing:
-
With
pagespeed LoadFromFileCacheTtlMs < 1000
, the first request will result incache-control: public, max-age=315360000, s-maxage=10
, while second, and any further request to optimized cache extended resource with.pagespeed
prefix will yieldcache-control: max-age=0,no-cache
! -
With
pagespeed LoadFromFileCacheTtlMs >= 1000
, the first request will result incache-control max-age=1
(or 3, or few seconds otherwise), while second, and any further request to optimized cache extended resource with.pagespeed
prefix will yieldcache-control:max-age=31536000
So multiple issues there:
- Values lower than 1000 result in no cacheability at all (I have no idea why that number is a breaking point)
- Values over 1000 result in cacheability, but there is no actual control of the value, since 31536000 seems to be hardcoded
I think the gotcha here is that LoadFromFileCacheTtlMs
controls cache TTL for the module's internal caching system, and is not controlling cache expiry directives at the http level when optimized resources are send over http..
What I think you are observing at the http level is that the output is either a cache-extended optimized resource with a 1 year TTL, or an intermediary output from the module where it didn't have an optimized resource ready yet. I think that retrying a couple of times will get you the optimized output eventually when that happens.
Well, that's what I thought:
I think the gotcha here is that
LoadFromFileCacheTtlMs
controls cache TTL for the module's internal caching system
But my test environment is an idle system with a single HTML file and single CSS file. I can definitely see resource already optimized, when they are.
However, I can see that the mtime check happens any time I modify contents of the CSS file, as this reflects in a different hash of the .pagespeed CSS URL on next reload. So it doesn't seem like LoadFromFileCacheTtlMs
plays a role in e.g. leaving the file completely alone for some time.
With LoadFromFileCacheTtlMs 999
(or anything below 1000 for that matter), not only all subsequent reloads result in cache-control: max-age=0,no-cache
on the .pagespeed resource (unless you assume I should wait 999 seconds to see a difference), but also the resource is not optimized (no minification), only cache extended.
So maybe, just maybe :) the extend cache filter is somehow kicking in always whereas others respect the LoadFromFileCacheTtlMs
before they do their stuff.
TL;DR: only use LoadFromFile on a local physical disk where stat() is cheap. Never use LoadFromFile on a mounted file system.
RE stat() overhead per-request: your observation is spot-on, and reflects the intended design, and does result in a stat() call on each resource every time it is referenced in an HTML file.
This is intended as an alternative to using HTTP-fetching and a file-cache. It avoids the HTTP fetch (and also side-steps any issues you might have with HTTPS fetching). A tradeoff is that it doesn't have access to the HTTP origin headers for your assets, so it doesn't know how often to re-check to see if the origin asset has changed. This also means that changes to assets take place immediately; they don't need to expire out of cache.
If stat() takes along time (e.g. it's a mounted system) then definitely don't use LoadFromFile; use HTTP fetching so we can get cache TTLs and periodic checking of how up-to-date the contents are, based on the origin TTL that you control per normal HTTP caching headers.
If you say LoadFromFileCacheTtlMs 999
you are saying the origin assets are valid for less than a second. This is not a scenario PageSpeed was designed for, but I admit the handling could be better -- e.g. just do nothing with the asset.
Hi. Sorry to jump into this specific query but I am trying to understand the updating system of sources of pagespeed when using the extended_cache filter related to the LoadFromFileCacheTtlMs
I posted a question on the google group (https://goo.gl/SgCWBj) but I'll try to briefly ask my question here as I am about to give up on it.
We are trying to improve a nginx cache proxy in front of various apache servers on containers. Our setup was intercepting static content... (css, js, images, etc) extending cache headers, proxying and caching:
client --> nginx proxy cache for css, js, images.. --> backend apache client --> nginx proxy (no cache) for the rest --> backend apache
Our intention now is to add pagespeed in order to improve in general and particularly css and JS further by concatenating (css + js) and properly versioning to extend caches which I understand is exactly what extended_cache filter does. (A list of our active filters bellow.)
Our doubts/issues are coming from some css and JS after treated by pagespeed do not get updated after a change and remain staled. This happens even if we've:
- forced the expires headers of the apache backend to 30 seconds. Would pagespeed use this value at all as a TTL to recheck the backend or will it use the one returned by the nginx-cache? 2.- forced the nginx-proxy-cache for static files to bypass the cache and set it to no-store for testing. Again... is pagespeed using this or the backend expires? 3.- cleared the cache for the specific file (ie /ass/skins/def/css/bootstrap.min.css+main.gis.css.pagespeed.cc.7FZQ-V4k-L.css) via the /pagespeed_admin. Only when clearing the whole pagespeed cache... the file gets updated
So a file as /ass/skins/def/css/bootstrap.min.css+main.gis.css.pagespeed.cc.7FZQ-V4k-L.css remains so after a change on the backend whatsoever.
This is what that root reports in the cache:
Metadata cache key:rname/cc_A5CWJ1Ij7nG9n_XUZo0i/t/vozrWfLE89IPR4eF0bW@
cache_ok:true
can_revalidate:false
partitions:partition {
optimizable: true
url: "https://xxxxxx/ass/skins/def/css/bootstrap.min.css+main.gis.css.pagespeed.cc.7FZQ-V4k-L.css"
input {
index: 0
type: CACHED
last_modified_time_ms: 1547041050000
expiration_time_ms: 1554586471000
date_ms: 1551994471000
input_content_hash: "7Du1KgDhdqcYHUVN_66iG"
url: "https://xxxxxxx/ass/skins/def/css/bootstrap.min.css"
}
input {
index: 1
type: CACHED
last_modified_time_ms: 1551993894000
expiration_time_ms: 1554586471000
date_ms: 1551994471000
input_content_hash: "UGK37FuBTQjWD1AQi_GCT"
disable_further_processing: true
url: "https://xxxxxxxxxxxxxxxxxxxxxxxxxxx/ass/skins/def/css/main.gis.css"
}
}
Even after clicking "delete" on that, no change. Only once I clear the whole pagespeed cache through the backend I get an updated version.
I'm probably not understanding something:
- do I require additional setup if using nginx proxy cache as in doc: https://www.modpagespeed.com/doc/downstream-caching.html considering this scenario. I am happy with html not being cached.
- does pagespeed "read" files from nginx-cache-proxy or is it aware of the backend upstream too?
- Would setting a lower value to LoadFromFileCacheTtlMs change anything? I understand default is 5 minutes if "no expires" header are set. What about if they are? Does the TTL become the Expires of the source brought by http from the backend or as I guess this value is only good for "local" disk access reads?
- should I just make nginx act as non-cache proxy and let pagespeed deal completely with that?
Thanks.
Active FIlters ah Add Head ai Add Instrumentation cc Combine Css jc Combine Javascript gp Convert Gif to Png jp Convert Jpeg to Progressive jw Convert Jpeg To Webp mc Convert Meta Tags pj Convert Png to Jpeg ws When converting images to WebP, prefer lossless conversions ec Cache Extend Css ei Cache Extend Images es Cache Extend Scripts fc Fallback Rewrite Css if Flatten CSS Imports hw Flushes html ci Inline Css ii Inline Images il Inline @import to Link ji Inline Javascript idp Insert DNS Prefetch js Jpeg Subsampling cj Move Css Above Scripts pr Prioritize Critical Css rj Recompress Jpeg rp Recompress Png rw Recompress Webp ri Resize Images cf Rewrite Css jm Rewrite External Javascript jj Rewrite Inline Javascript cu Rewrite Style Attributes With Url cp Strip Image Color Profiles md Strip Image Meta Data
Hi @luison I think i´m unaware to respond all these question cause is a scene i have not used. But I try: Where is pagespeed installed? in the apache backend or in the nginx proxy? I think pagespeed "read" the ttl header (cache-control header) set in the webserver is running. Origin resource ttl is used by pagespeed to set ttl of the optimized resources in pagespeed cache, when pagespeed fecht the resource via http. When the resource is loaded from file implies don´t have ttl headers (in fact load from file don´t have any header) so you must configure a default ttl time for resources loaded from file.
Deleting the metadata cache don´t delete the optimized resource from the cache.
Thanks @Lofesa, pagespeed is installed on the nginx proxy_cache. Thanks for the info regarding metadata but the issue remains. Actually even removing all cache services now and just using nginx as a proxy to our apache backend, issue remains, so trying to figure what else might be wrong in our setup.
Hi If you have running the module in the nginx cache I think you can´t use LoadFromFile, need to fecht resources by http fecht, and according to the doc you must exclude optimized resources (these that have .pagespeed.xx.hash.ext in the url) from the proxy-cache.