incubator-pagespeed-mod icon indicating copy to clipboard operation
incubator-pagespeed-mod copied to clipboard

modpagespeed.com lost its MapProxyDomain config

Open jmarantz opened this issue 9 years ago • 17 comments

The modpagespeed.com front page has a demo for MapProxyDomain, but it does not work. We need to add: ModPagespeedMapProxyDomain http://modpagespeed.com/static http://www.gstatic.com/psa/static for 1.gif to be moved there.

jmarantz avatar Oct 19 '15 19:10 jmarantz

I've added that MapProxyDomain command to modpagespeed, and http://modpagespeed.com/proxy_external_resource.html?PageSpeed=on&PageSpeedFilters=rewrite_images now has <img src=""/>. So MapProxyDomain is turning on, since it's adding that host to the domain lawyer, but it's inlining it so we're not really demonstrating it. So we also need to update the demo page to be: http://modpagespeed.com/proxy_external_resource.html?PageSpeed=on&PageSpeedFilters=+rewrite_images,-inline_image . I'll do that.

jeffkaufman avatar Oct 23 '15 14:10 jeffkaufman

Actually, http://modpagespeed.com/static/1.gif.pagespeed.ce.JiBnMqyl6S.gif 404s. Not sure why yet.

jeffkaufman avatar Oct 23 '15 14:10 jeffkaufman

Error log has:

Rejected absolute url reference http://modpagespeed.com/static/1.gif
Fetch failed for resource url http://modpagespeed.com/static/1.gif.pagespeed.ce.JiBnMqyl6S.gif
Fetch failed for http://modpagespeed.com/static/1.gif.pagespeed.ce.JiBnMqyl6S.gif, status=404
http://modpagespeed.com/static/1.gif.pagespeed.ce.JiBnMqyl6S.gif resource_404_count: not found (404)

jeffkaufman avatar Oct 23 '15 14:10 jeffkaufman

Poking around at this I see that

http://modpagespeed.com/static/1.gif works http://modpagespeed.com/static/1.gif.pagespeed.ce.JiBnMqyl6S.gif FAILS

It must be that our handler code for pagespeed resources with MapProxyDomain has a problem but I would think that would've been reported already. Maybe it's particular to extend_cache.

jmarantz avatar Nov 16 '15 18:11 jmarantz

Tried a local repro & it worked fine. Maybe broken in 1.9 and working in trunk?

jmarantz avatar Nov 16 '15 18:11 jmarantz

Just ran into this.. took a quick look at where the error message originates from.

rewrite_context.cc has:

bool RewriteContext::PrepareFetch(
....
      if (FindServerContext()->url_namer()->ProxyMode()
            == UrlNamer::ProxyExtent::kNone &&
          !driver->MatchesBaseUrl(*url)) {
        // Reject absolute url references unless we're proxying.                                                                                                                                                                                        
        is_valid = false;
        message_handler->Message(kError, "Rejected absolute url reference %s",
                                 url->spec_c_str());
        break;
      }

It seems FindServerContext()->url_namer()->ProxyMode() == UrlNamer::ProxyExtent::kNone will always be true, and the base url is also never going to match when we hit this case. What would be against querying the domain lawyer first, to see if the target url is authorized for fetching?

oschaaf avatar Jul 06 '17 20:07 oschaaf

(Test 'Issue 609 -- proxying non-.pagespeed content, and caching it locally' flakes, and I suspect we can attribute that to this issue)

  • edit - correction, I think this is something else, but there are the 'MapProxyDomain' test and the tests that inline from external servers

oschaaf avatar Jul 06 '17 20:07 oschaaf

Created an experimental branch to try this, the change itself is trivial (but assessing the impact is not): https://github.com/pagespeed/mod_pagespeed/compare/oschaaf-experiment-rejected-abs-url

oschaaf avatar Jul 07 '17 07:07 oschaaf

Proxy mode was largely a PSS thing, where .pagespeed. resources were moved to a separate domain automatically --- e.g. foo.com/image.png would end up something like 1-ps.googleusercontent.com/h/foo.com/image.png. (Though there is an undocumented "measurement proxy" mode in MPS/NPS that uses it too which I apparently did stuff on ;-) ).

Looking at your change, I think the new stuff isn't hit because it likely hits resource == null first, as that happens every time resource isn't authorized unless you set some very special flags.

I don't think it's the right solution, though, looking at jefftk's original commit message:

"Reject requests for absolute urls unless they requested origin is us. We should never generate these urls, but if abused they could force us to load arbitrary content from domains approved with AddDomain/ModPagespeedDomain. People AddDomain() because they want resources on that domain that appear in html to be rewritten, and this doesn't change that behavior.

Once we can't be suckered into loading arbitrary content we can remove the content-type whitelist in CacheExtender::RewriteLoadedResource and implement extension-based cache extension of non-resources (ex: linked pdfs)."

The worry is that while you may be OK with, say, inlining an image from example.com that you used yourself, you might not want people to be able to inject an arbitrary thing, since who knows what clever enough hacker would be able to do with it? Also, say, doing AddDomain img.example.com would normally cause us to rewrite stuff on img.example.com to img.example.com rather than to www.example.com --- which is pretty important as img. normally can't access cookies explicitly scoped to www., so it could potentially have lower level of trust.

So it needs something more subtle, I think, that actually understands that MapProxyDomain is involved, and this method might be too late, even... though RewriteContext::DecodeFetchUrls might just be the trickiest piece of code in the entire codebase... and might not be the right place, either. The key thing to note, I think, is that e.g. on www.modpagespeed.com/proxy_external_resource.html?PageSpeedFilters=extend_cache you get http://modpagespeed.com/static/1.gif.pagespeed.ce.JiBnMqyl6S.gif, which is perfectly expected, while something like $HOST_NAME/,hexample.com.pagespeed.jm.0.js in the test isn't the sort of thing that should happen in the first place.

On Fri, Jul 7, 2017 at 3:11 AM, Otto van der Schaaf < [email protected]> wrote:

Created an experimental branch to try this, the change itself is trivial (but assessing the impact is not): https://github.com/pagespeed/ mod_pagespeed/compare/oschaaf-experiment-rejected-abs-url

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pagespeed/mod_pagespeed/issues/1155#issuecomment-313605586, or mute the thread https://github.com/notifications/unsubscribe-auth/ADl1RK4SB0FJjfi2oksgUV4Q2OVRNutYks5sLdoEgaJpZM4GRl4g .

morlovich avatar Jul 07 '17 14:07 morlovich

@morlovich thanks for the extended explanation! Let me think about that for a bit

oschaaf avatar Jul 07 '17 16:07 oschaaf

Cross linking a similar ngx_pagespeed issue with MapOriginDomain: https://github.com/pagespeed/ngx_pagespeed/issues/1279

oschaaf avatar Jul 08 '17 20:07 oschaaf

@morlovich what do you think of https://github.com/pagespeed/mod_pagespeed/commit/11e6a6570e33e2bf615e98eddcad87f9d9bf779a ? (may need some more test coverage to prove it, but I think this change makes behaviour of Disallow more consistent with the html rewriting flow)

oschaaf avatar Jul 12 '17 22:07 oschaaf

Some more context for what happens at modpagespeed.com:

  1. Purge by requesting http://www.modpagespeed.com/pagespeed_admin/cache?purge=*
  2. Request http://modpagespeed.com/static/x1.gif.pagespeed.ic.zaZh-vXmDi.webp

Observe the webp fails to load.

  1. Purge by requesting http://www.modpagespeed.com/pagespeed_admin/cache?purge=*
  2. Open http://www.modpagespeed.com/proxy_external_resource.html?PageSpeed=on&PageSpeedFilters=+rewrite_images,-inline_images in a browser
  3. Request http://modpagespeed.com/static/x1.gif.pagespeed.ic.zaZh-vXmDi.webp

Observe the webp now loads fine in 2. and 3.

modpagespeed.com has an .htaccess which start with ModPagespeedDisallow *, and then goes on and whitelists some stuff (but not all we need for the MapProxyDomain example to work correctly).

oschaaf avatar Jul 12 '17 22:07 oschaaf

i am getting the same problem. after uploading an image , after 2 days it dusappears , and gives a 404 log error X’’filename’’.jpg:0: Resource based on https://www.’mysite’.gr/wp-content/uploads/2018/09/’filename’.jpg but cannot access the original [Sun Sep 30 15:07:40.907169 2018] [pagespeed:warn] [pid 9315:tid 140625047205632] [mod_pagespeed 1.13.35.2-0 @9315] [0930/150740:WARNING:resource_fetch.cc(195)] Fetch failed for resource url https://www.mysite.gr/wp-content/uploads/2018/09/’’filename’’.jpg.pagespeed.ic.Y281WFqmAM.webp [Sun Sep 30 15:07:40.908504 2018] [pagespeed:warn] [pid 9315:tid 140625517197056] [mod_pagespeed 1.13.35.2-0 @9315] Fetch failed for https://www.’’mysite’.gr/wp-content/uploads/2018/09/x’filename’.jpg.pagespeed.ic.Y281WFqmAM.webp, status=404

have you figured out a solution?

biomedicus avatar Sep 30 '18 18:09 biomedicus

will it ever be resolved???

vhoy avatar Apr 06 '19 22:04 vhoy

mod page speed seems to 404 resources on a website after some time of running just fine. How to fix?

LiamKarlMitchell avatar May 20 '19 22:05 LiamKarlMitchell

So are the random 404s on cached (webp) resources easy to fix? Or am I doing something fundamentally wrong with my installation of PageSpeed Module?

jak-kal avatar Jun 18 '19 11:06 jak-kal