incubator-pagespeed-mod
incubator-pagespeed-mod copied to clipboard
Summary of mod_rewrite and mod_pagespeed interactions
This is not a defect per-se rather an explanation of how we work with
mod_rewrite and what can (and does) go wrong. It's here so it's searchable by
devs and users.
* mod_rewrite is a translate_name hook in slot APR_HOOK_FIRST-1
* mod_rewrite also sets the handler to 'rewrite-handler' (I think that's it;
the point is that it sets the handler)
* mod_pagespeed adds its own translate_name hook in slot APR_HOOK_FIRST-2
* this hook saves the original URL before mod_rewrite rewrites it
* mod_pagespeed adds its own map_to_storage handler (instaweb_handler)
that does the heavy lifting of rewriting/delivering rewritten content.
* mod_pagespeed uses various 'virtual' URLs, incl: mod_pagespeed_beacon,
mod_pagespeed_statistics, mod_pagespeed_console, etc.
* These are enabled by <Location> directives in pagespeed.conf.
* This changes the request's handler to the specified string; f.ex in the
case of mod_pagespeed_console it's 'mod_pagespeed_console'.
* mod_pagespeed's instaweb_handler routine first looks at the request's
handler and processes its set of special cases; after that it looks at
the URL and processes it accordingly; there are some special cases:
mod_pagespeed_static and now/soon mod_pagespeed_beacon [I know I said
above it uses a <Location> directive; at the time of writing this it
does but I have a change in flight to also handle it here].
* I don't know if <Location> directives are processed before mod_rewrite
runs or if mod_rewrite's processing effectively disables <Location>
processing, but one thing is for sure: if an URL is rewritten by
mod_rewrite then the <Location> directive does NOT set the request's
handler - it stays as 'rewrite-handler'.
So, given all that, what happens when an URL is handled by mod_rewrite then
mod_pagespeed?
1. mod_pagespeed saves the URL via its translate_name hook.
2. mod_rewrite looks at the URL; if it matches a rule it rewrites it AND
sets the request's handler to 'rewrite-handler'.
3. mod_pagespeed handles the URL: since step 2 set the request's handler
it doesn't match any of ours so we fall back to normal processing.
** This means that all URLs that are handled by <Location> directives do
not work!
4. If we decline to handle the URL, in particular a beacon POST, Apache
tries again and the above loop happens all over again, and again,
until eventually Apache gives up and returns a 500 HTTP status.
The work-around is to put this into each and every <Location> directive in
pagespeed.conf:
<IfModule mod_rewrite.c>
RewriteEngine Off
</IfModule>
The 'fix' is to not rely on <Location> directives but to handle the URLs as
special cases in the fallback path; this is what my change to
mod_pagespeed_beacon does.
The most common setup we've seen where this arises is in WordPress sites since
it adds some mod_rewrite rules to map URLs that aren't files or directories to
the /index.html file.
How to setup mod_rewrite to replicate the 500 HTTP status problem:
1. In httpd.conf, in the <Directory "/usr/local/apache2/htdocs"> area
make sure you have AllowOverride All.
2. Create a .htaccess file in /usr/local/apache2/htdocs with:
RewriteEngine On
RewriteBase /
RewriteRule ^index\.html$ - [L]
RewriteRule ^favicon.ico$ - [L]
RewriteRule ^.*\.pagespeed\..*$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.html [L]
Explanation: Leave index.html untouched; leave favicon.ico untouched;
leave .*.pagespeed..* untouched; if the file corresponding to the URL
does NOT exist as a file or directory, rewrite the URL to /index.html.
Original issue reported on code.google.com by [email protected]
on 18 Apr 2013 at 2:19
Original comment by [email protected]
on 18 Apr 2013 at 2:19
- Removed labels: Priority-Medium, Type-Defect
Nice write up. Note that we can't really avoid <Location> in general since it's
also tied up with access control, which we do want for most stuff (but not
beacons).
Original comment by [email protected]
on 18 Apr 2013 at 2:27
Clearing from my open issues list, should still be searchable though.
Original comment by [email protected]
on 27 Sep 2013 at 1:17
- Changed state: Done
OK, reopening but removing me as the assignee, since otherwise it's not easy to
search for 'starred' bugs.
Original comment by [email protected]
on 27 Sep 2013 at 1:23
- Changed state: Accepted
Original comment by [email protected]
on 21 Oct 2013 at 11:41
- Changed title: Summary of mod_rewrite and mod_pagespeed interactions
@GoogleCodeExporter
Thanks for the work around. However, in your posted configuration, the line: "RewriteEngine Off" is a misconfiguration, and adding it to your system will not change any system behavior. Why "RewriteEngine Off" is allowed by Apache is that, if you include multiple "RewriteRule" parameters in your configuration, then instead of commenting them all, you can explicitly using “RewriteEngine Off” to disable all "RewriteRule".
More importantly, the default value of “RewriteEngine" is already an "off", so adding “RewriteEngine Off" is quite unnecessary and it may cause confusion to users.
Since herein there is no "RewriteRule", deleting “RewriteEngine Off” would be ideal.
Related Apache source code snippet:
run_rewritemap_programs(server_rec *s , apr_pool_t *p){
if (conf->state == ENGINE_DISABLED) { // usage of "RewriteEngine"
return APR_SUCCESS; // early return
rewritemap_program(...); // usage of "RewriteRule"
}