incubator-pagespeed-mod icon indicating copy to clipboard operation
incubator-pagespeed-mod copied to clipboard

Disable auto-adding PageSpeed query parameter to location header

Open AlexeyKosov opened this issue 8 years ago • 16 comments

Since Google PageSpeed is not compatible with Google Search Engine which results in the fact Google indexes URLs containing the ?PageSpeed=noscript query parameter even despite rel=canonical tag, it would be great to have the ability to weed out all the query parameters used by PageSpeed. But even if I make a 301 redirect from example.com/?PageSpeed=noscript to example.com/, mod_pagespeed automatically adds that parameter back to the query, adjusting the location header. So for SEO purposes, there should be a way to disable the query parameters and allow cutting off the parameters for redirection.

AlexeyKosov avatar Aug 25 '17 12:08 AlexeyKosov

You should be able to turn that off, see https://www.modpagespeed.com/doc/faq#noscript-redirect Feel free to re-open this if that somehow didn't work for you!

oschaaf avatar Aug 25 '17 12:08 oschaaf

It does remove the meta tag, but I still can't do a 301 redirect from example.com/?PageSpeed=noscript to example.com/ because pagespeed adds the PageSpeed query parameter to the location header. It ends up with a perpetual redirect.

AlexeyKosov avatar Aug 25 '17 12:08 AlexeyKosov

I see, re-opening

oschaaf avatar Aug 25 '17 12:08 oschaaf

had this issue on nginx also, I guess this is still an issue?

ozdemirburak avatar Jul 09 '18 01:07 ozdemirburak

Same issue here using Nginx.

martinnabhan avatar Feb 21 '19 10:02 martinnabhan

any news on this one? I have stumbled upon on the same issue here with nginx.

Is there any workaround? Is there an nginx directive, where I can strip this before getting to pagespeed?

zeldi-dev avatar Oct 07 '19 08:10 zeldi-dev

HI @zevnikrok Have you set pagespeed SupportNoScriptEnabled false; in your config?

Lofesa avatar Oct 07 '19 11:10 Lofesa

yes, but I have the same problem as @AlexeyKosov : google managed to get a grip on ?PageSpeed URLs and they are now in it's cache and I keep getting these errors.

zeldi-dev avatar Oct 07 '19 12:10 zeldi-dev

But now your canonical have these params? If the canonical don´t have it, google take some time to recrawl the url and change it. And maybe that pagespeed still have stored this in their own cache, so perhaps is a good idea to clean the pagespeed cache. In google search console you can do a "live url test" and request to reindex the url, but is not a massive tool so you need to do one by one.....

Lofesa avatar Oct 07 '19 12:10 Lofesa

I have never had these parameters in my links, AFAIK google tries to guess potential url parameters which it can use. I did clear the cache, but this doesn't help when someone maunally sends Pagespeed=noscript page in an url.

I can retest the url, but since the malfunctioning url still containts ?Pagespeed it puts pagespeed in an infinite loop on the retest of this url.

zeldi-dev avatar Oct 07 '19 14:10 zeldi-dev

Hi @zevnikrok I think google can´t try to guess url parameter, it discover url link that allready exits, in your site or in other site. So... if your site has pagespeed SupportNoScriptEnabled false; then your site must have a link rel="canonical" to you site url and if you make a redirect you must have a location=url w/o this parameter. Have your page a link rel="canonical" w/o the noscript parameter? If you have it, google must index these canonical url In the html code, have you a noscript meta redirect? In the log files, from where come the request w/ noscript parameter? Where you can see the url w/ the noscript parmeter indexed?. If you see it in webmaster tool (google search console) then google boot has seen these url w/ parameter in somewhere site. Can you share a url to test it?

Lofesa avatar Oct 07 '19 16:10 Lofesa

@lofesa I am not sure if I understand the part:

"then your site must have a link rel="canonical" to you site url and if you make a redirect you must have a location=url w/o this parameter. Have your page a link rel="canonical" w/o the noscript parameter?"

The webpage in question is: https://www.liupka.com The only requests with noscript parameter come from googlebot and I see the errors in webmaster tools. For example: https://www.liupka.com/koncani-izdelki/uporabni-izdelki-za-dom/tekstil-c1450/preproge/Kvackana-bela-okrogla-preproga?PageSpeed=noscript

From the documentation: The defer_javascript, lazyload_images, dedup_inlined_images, and local_storage_cache filters require JavaScript to render pages correctly. To support clients that have JavaScript disabled, if any of these filters are enabled, PageSpeed will insert a meta refresh inside a noscript tag at the top of the page. This meta refresh will redirect clients with JavaScript disabled to the current URL with a '?PageSpeed=noscript' query parameter appended which disables filters that require JavaScript.

I understand now, where the noscript comes from, and why does googlebot have these links in its cache.

But is there any way of pagespeed surviving this option?

zeldi-dev avatar Oct 10 '19 14:10 zeldi-dev

Hi @zevnikrok About the link rel="canonical"... these link say to google what the "original" page is. Think about a page, say a list of products and their prices. The page have a url https://mydomain.com/products but this page have a parameter to show the product list ordered by price https://mydomain.com/products/?order=price, now google bot see 2 url with the same content so penalize this page with a duplicate content issue. If you put, in the html code, a link rel canonical ( <link rel="canonical" href="https://mydomain.com/products" /> ) in the page, then google only indexes the "canonical" page ( https://mydomain.com/products ) and no duplicate conten issue. I can see you don´t have this in your page.

Well, now you have figured out from where the link with ?PageSpeed=noscript comes and have set the pagespeed SupportNoScriptEnabled false; to disable it, but google had these url and try again and again to fecht it.... pagespeed module can´t solve this, has nothing to do on how google do their work to store in cache/ indexing url´s.... But you have options: 1.- In the new Search Console, at the left panel you have an option "Legacy tools and reports", in these option "Removals". With this tool, you can hide the url in search result for a 6 moth time lapse. 2.- As far as you can´t set a robots noindex metatag in the page cause this "deletes" the page from the index and you don´t will that, you need to set nginx to return a 404 - 410 error when a url have the ?PageSpeed=noscript parameter, these error tell to google that the page is gone so then it deletes the url from their index and cache. 410 is better that 404, cause 410 tell to google that the page is gone consciously, so take less time to delete it. Is not a fast process, take some time that google bot try to fecht the url get the 404-410 error and deletes de url from index/cache. In the server block of the nginx conf you can set some like this: if ($args ~* (.*noscript.*){ return 410;} and then wait....

Lofesa avatar Oct 10 '19 16:10 Lofesa

I do have rel canonical links, but we send them in header (so we can use them also on html files).

But I have put now return 410 in the server configuration segment:

if ($args ~* ".PageSpeed=noscript.") { return 410; }

And now I hope for the best. Thank you for helping resolving the issue.

zeldi-dev avatar Oct 11 '19 14:10 zeldi-dev

Hi @zevnikrok I can´t figured out that you are sending the canonical as header but not in the home page. And be patient, google take their time to delete a url from their index/cache.

Lofesa avatar Oct 12 '19 13:10 Lofesa

@oschaaf More on this issue. Seems that when in the url don´t have a backslash before the ? this make the loop. https://www.mydomain.com/?PageSpeed=noscript vs. https://www.mydomain.com?PageSpeed=noscript

Lofesa avatar Dec 04 '19 20:12 Lofesa