djangoproject.com icon indicating copy to clipboard operation
djangoproject.com copied to clipboard

Google search results are showing French translations for English language links

Open rhelms opened this issue 6 years ago • 29 comments

Recently in my google searches for django things, French translations have been showing for English links.

For example, this search (https://www.google.com/search?client=ubuntu&channel=fs&q=django+refresh_from_db&ie=utf-8&oe=utf-8) resulted in https://docs.djangoproject.com/en/2.1/ref/models/instances/ being the displayed URL, but French is displayed.

Luckily, django methods are all in English, but could be an issue if I was searching for django concepts that did have a method associated.

2019-02-20-094421_1916x1053_scrot

rhelms avatar Feb 19 '19 23:02 rhelms

@tobiasmcnulty - could this be related to your recent changes?

timgraham avatar Feb 20 '19 13:02 timgraham

That's...odd. I don't think anything I've changed could be causing that, but I suppose anything is possible.

Could this be the inverse of #805?

I did notice a potential, related issue while working on language activation in the umerged PR #862. Specifically, the site does not activate English if that's the language the user is requesting, so it would inherit whatever language was set previously for that uswsgi thread or by LocaleMiddleware. But, I didn't think this applied to the documentation itself, just the strings on the site.

tobiasmcnulty avatar Feb 20 '19 16:02 tobiasmcnulty

That could be a bug in Google too!

claudep avatar Feb 20 '19 19:02 claudep

I think this has a chance to be solved with #862 PR. Google might have been fooled by Vary header telling that pages in different languages are the same (so it thought French title is an English title then, and use it as more fresh cause recently changed and reindexed).

m-aciek avatar Feb 22 '19 21:02 m-aciek

The LocaleMiddleware removal for docs.djangoproject.com has been deployed (#862), so please keep an eye out for any changes. I've also noticed a few search results coming back in French (but linking to the English URLs), so hopefully this helps (though I'm still not seeing how it will, unless Google is indeed doing something odd...).

tobiasmcnulty avatar Feb 23 '19 15:02 tobiasmcnulty

Google search for "Django Tutorial Admin" shows the Search Snippets in Indonesian.

See attached screenshot. The text "Menulis aplikasi Django kedua anda, bagian 2" is in Indonesian. Notice that the URL is for "en" (English), but the Google Search snippet shown is for the "id" (Indonesian) webpage. Here are the two pages. EN: https://docs.djangoproject.com/en/2.1/intro/tutorial02/ ID: https://docs.djangoproject.com/id/2.1/intro/tutorial02/

  • I have never setup Indonesian on my browser.
  • I checked on two different computers (One computer at India and another at US, both showed the same error, even though they are entirely different machines and configured with English.)

Debugging Hint:

Open Page Source for this https://docs.djangoproject.com/en/2.1/intro/tutorial02/ in your browser, and look fo rel="alternate" or hreflang=

These attributes control the language in which the page is displayed in the browser. I didn’t see anything obviously wrong on the HTML page itself.

The title "Menulis aplikasi Django kedua anda, bagian 2" is in Indonesian. Notice "id" in the URL here: https://docs.djangoproject.com/id/2.1/intro/tutorial02/

django google search snippet error [Above image without edits, in full size](https://user-images.githubusercontent.com/2798690/53386196-44512100-39a7-11e9-997f-f43a1714803e.png)

nitinnain avatar Feb 26 '19 03:02 nitinnain

This is really weird, I get a /fr/ URL in Indonesian with the same search with a Accept-Language favouring fr-CA

charettes avatar Feb 26 '19 03:02 charettes

Does anyone know someone at Google we could report this issue to?

claudep avatar Feb 26 '19 07:02 claudep

Do we have Google Search Console [1] set up for docs.djangoproject.com? Is it showing any hints or errors? We should inspect also our sitemap, if it is correct.

Regards, Maciej

[1] https://search.google.com/search-console/about

wt., 26.02.2019, 08:24 użytkownik Claude Paroz [email protected] napisał:

Does anyone know someone at Google we could report this issue to?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/django/djangoproject.com/issues/868#issuecomment-467326184, or mute the thread https://github.com/notifications/unsubscribe-auth/AI25Tg0SUYOO-N75BsltBJdMHuELYpGBks5vROFBgaJpZM4bEGwX .

m-aciek avatar Feb 26 '19 07:02 m-aciek

Chiming in to point out, in case it hasn't been noted already: the cached copy for "Django Tutorial Admin" in Indonesian (with the English link) has <html lang="id"> (which is set in the base_docs.html by {% block html_language_code %}{{ lang|default:"en" }}{% endblock %} and ultimately by a call to activate(lang))

kezabelle avatar Feb 26 '19 08:02 kezabelle

I found an article about i18n and Google. We do not follow its guidelines for sitemaps (link rel alternate & hreflang).

m-aciek avatar Feb 26 '19 08:02 m-aciek

Interestingly the Google search link that originally prompted this issue now shows English again. But yes, I see Indonesian text when Googling "Django Admin Tutorial" now, too:

https://www.google.com/search?q=django+tutorial+admin

It feels very much like a caching issue, but I don't see where it could be occurring 😦

It looks like @aaugustin may have verified the domain with Google Webmaster Tools about 6 years ago (https://github.com/django/djangoproject.com/commit/a0907ff742c81b676f602d1e17d820152f95d22e); would you be able to grant me access to this if you still have it @aaugustin ?

tobiasmcnulty avatar Feb 26 '19 14:02 tobiasmcnulty

I have access to the Google Search Console but I don't see how I can give access to others. Here's the report for https://docs.djangoproject.com/en/2.1/intro/tutorial02/

screenshot from 2019-02-26 09-35-48

This is the information about "Google-selected canonical" https://support.google.com/webmasters/answer/9012289#google-selected-canonical

timgraham avatar Feb 26 '19 14:02 timgraham

In docs' metatags we set canonical per page (for example .../id/2.1/intro/tutorial02's canonical is .../id/2.1/intro/tutorial02). IMO it can interfere badly with rel='alternate' links. Like if some article has alternate links, all those alternatives should have only one canonical.

Possible solution then would be to set canonicals for all languages to en versions of documentation pages.

m-aciek avatar Feb 26 '19 14:02 m-aciek

Just saw French on another search (The language in the Google search snippet changes depending on search query): Google "Django Admin Actions"

The problem doesn't occur up on DuckDuckGo!

nitinnain avatar Feb 26 '19 16:02 nitinnain

Another article seems to support my thesis: https://developers.google.com/search/mobile-sites/mobile-seo/separate-urls. rel:canonical and rel:alternate are treated equally. AFAIC we should make canonicals point to English versions.

m-aciek avatar Feb 26 '19 16:02 m-aciek

I've just opened draft pull request #871.

m-aciek avatar Feb 26 '19 17:02 m-aciek

@m-aciek @timgraham It is indeed odd that google chose the 'id' version of that page as canonical, but I'm not sure #871 is the appropriate fix:

screenshot 2019-02-26 21 12 28

From: https://support.google.com/webmasters/answer/139066?hl=en

tobiasmcnulty avatar Feb 27 '19 02:02 tobiasmcnulty

To throw another theory out there, I think we are misusing x-default:

https://webmasters.googleblog.com/2013/04/x-default-hreflang-for-international-pages.html

This page seems to suggest that use of x-default should be limited to pages that have no specific language (it's not a "default language"). We appear to render it on all docs pages with a link to the English version of the page, so perhaps that's confusing Google?

https://github.com/django/djangoproject.com/blob/master/djangoproject/templates/docs/doc.html#L26-L30

We also only render <link rel="alternate" ..> for the canonical version of the docs:

https://github.com/django/djangoproject.com/blob/master/docs/views.py#L52

It would seem appropriate to render that for all versions?

I put up a PR with these and some related changes here: #872

tobiasmcnulty avatar Feb 27 '19 02:02 tobiasmcnulty

FTR: issue #621 started with similar topic and started SEO for djangoproject.com.

m-aciek avatar Feb 27 '19 07:02 m-aciek

Good find @m-aciek .

@apollo13 It looks like my PR #872 partly reversed what you did here: https://github.com/django/djangoproject.com/commit/d6a966f6506a6c15b5f823edfb31491616989dca#diff-edc52c8f3a604a128e8f302806fb9262

Any memory of what the reason was for that and/or do you have any objections to showing the hreflang tags on all docs versions (not just the canonical one)?

tobiasmcnulty avatar Feb 27 '19 19:02 tobiasmcnulty

If this doesn't work, another thing we might try is refactoring the sitemap to use hreflang-style link declarations as described here: https://support.google.com/webmasters/answer/189077?hl=en

Right now it looks like each language gets its own sitemap.

tobiasmcnulty avatar Feb 28 '19 02:02 tobiasmcnulty

Puh, I will give it a look in the afternoon. The main thing is that documentation on what Google interprets how is rather sparse. So I tried (over a few weeks) what worked best at that time and then committed that. That doesn't mean it is the best approach nowadays though.

On Wed, Feb 27, 2019, at 20:00, Tobias McNulty wrote:

Good find @m-aciek https://github.com/m-aciek .

@apollo13 https://github.com/apollo13 It looks like my PR #872 https://github.com/django/djangoproject.com/pull/872 partly reversed what you did here: tobiasmcnulty@d6a966f#diff-edc52c8f3a604a128e8f302806fb9262 https://github.com/tobiasmcnulty/djangoproject.com/commit/d6a966f6506a6c15b5f823edfb31491616989dca#diff-edc52c8f3a604a128e8f302806fb9262

Any memory of what the reason was for that and/or do you have any objections to showing the hreflang tags on all docs versions (not just the canonical one)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/django/djangoproject.com/issues/868#issuecomment-467987630, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE-i79OjdIsTnZwhnXnnbOOE093mGuhks5vRtXRgaJpZM4bEGwX.

apollo13 avatar Feb 28 '19 10:02 apollo13

@tobiasmcnulty I was mainly following https://github.com/django/djangoproject.com/issues/621#issuecomment-216253813 when coming up with which rels should point where… I'll see that I can give you access to the google search tools.

apollo13 avatar Feb 28 '19 12:02 apollo13

Thanks @apollo13 I see your commit implemented exactly what that page recommends. I guess we can see how it behaves with the hreflang tags on all pages for a bit and then revert if there's a regression.

@timgraham @m-aciek I found this link which seems to suggest that making English (or any one language) the canonical version of the page is not correct (see the the "Most common mistakes implementing hreflang and canonical tags" heading): https://www.portent.com/blog/seo/implement-hreflang-canonical-tags-correctly.htm

But again, who knows if these 3rd parties have it correct or not. 😕

tobiasmcnulty avatar Feb 28 '19 14:02 tobiasmcnulty

Here's a list of the current (last updated by Google on 2/25/19) URLs that Google chose as canonical instead of the ones we suggested. None of them looks particularly worrisome: https://docs.google.com/spreadsheets/d/16oYtNJVhqAVH7wyIza10z1Pv4NhpSbx8g0QG9EpDGQE/edit#gid=0

Also, I've taken a snapshot of the full current index coverage report here, with some commentary and links to other, related issues: https://docs.google.com/spreadsheets/d/1l86YAEcw5CbvivuY-ZN81oy75Nh7T9g0eJX0sZ8Jww0/edit#gid=0

In particular #878 may be relevant to this issue.

tobiasmcnulty avatar Mar 02 '19 14:03 tobiasmcnulty

Still not fixed 🙁

Screenshot 2019-03-18 20 55 05

tobiasmcnulty avatar Mar 19 '19 00:03 tobiasmcnulty

Hrmpf :( If google would document how that stuff is supposed to work :( I mean it worked for years :/

apollo13 avatar Mar 19 '19 07:03 apollo13

Indeed, this was scraped 2 days ago and in French: https://webcache.googleusercontent.com/search?q=cache:uxFlZ6Hw5RMJ:https://docs.djangoproject.com/en/2.1/topics/db/examples/many_to_many/+&cd=1&hl=en&ct=clnk&gl=nl

Not sure if it's somehow possible to purge this from the Google results using the webmaster tools, but the tags look mostly ok right now. The only thing that's wrong is that there is a pt-BR for example, but not pt

wolph avatar Mar 24 '19 00:03 wolph

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 06 '22 00:10 stale[bot]