sitemap icon indicating copy to clipboard operation
sitemap copied to clipboard

Locale handling complete?

Open groe opened this issue 9 years ago • 14 comments

Hi,

great plugin, has worked perfectly for me so far!

One question about multi-locale handling: I noticed that the generated sitemap for a multi-locale sites refers to other (secondary?) locales via rel=alternate. But according to this article from the google search console, each locale version of a specific page must have it's own <loc>-tag additionally (check out the example there).

So unless I am missing something, the sitemap generated by the plugin does contain all pages from one locale, and their references to other locales but not the other locales themselves.

Can someone confirm this?

groe avatar Dec 18 '15 14:12 groe

Looking through the models, Sitemap_AlternateUrlModel::getDomElement looks like it should be output as expected (i.e. separate nodes within the same <loc>).

Are you seeing something different? Could you share the output you’re getting?

On 18 Dec 2015, at 14:41, Benjamin Grössing [email protected] wrote:

Hi,

great plugin, has worked perfectly for me so far!

One question about multi-locale handling: I noticed that the generated sitemap for a multi-locale sites refers to other (secondary?) locales via rel=alternate. But according to this article from the google search console https://support.google.com/webmasters/answer/2620865?hl=en, each locale version of a specific page must have it's own -tag additionally (check out the example there).

So unless I am missing something, the sitemap generated by the plugin does contain all pages from one locale, and their references to other locales but not the other locales themselves.

Can someone confirm this?

— Reply to this email directly or view it on GitHub https://github.com/joshuabaker/craft-sitemap/issues/17.

joshuabaker avatar Dec 22 '15 16:12 joshuabaker

Yes, it outputs multiple xhtml:link-tags within the same loc-tag. For instance, this is my output for 1 content page in 2 languages:

<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <url>
    <loc>http://localhost/de/b/foobar-german</loc>
    <lastmod>2015-12-22T17:08:50+00:00</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>
    <xhtml:link rel="alternate" hreflang="de" href="http://localhost/de/b/foobar-german"/>
    <xhtml:link rel="alternate" hreflang="en" href="http://localhost/en/b/foobar-english"/>
  </url>
</urlset>

But according to this Google Search Console Help Article, this is not enough.

In addition to the xhtml:link-tags every localized version of a page must have its own url/loc-tag as well.

In the example above, the correct output should therefore be:

<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <url>
    <loc>http://localhost/de/b/foobar-german</loc>
    <lastmod>2015-12-22T17:08:50+00:00</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>
    <xhtml:link rel="alternate" hreflang="de" href="http://localhost/de/b/foobar-german"/>
    <xhtml:link rel="alternate" hreflang="en" href="http://localhost/en/b/foobar-english"/>
  </url>
  <!-- the same block again, but now with the english URL in the <loc>-tag -->
  <url>
    <loc>http://localhost/en/b/foobar-english</loc>
    <lastmod>2015-12-22T17:08:50+00:00</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>
    <xhtml:link rel="alternate" hreflang="de" href="http://localhost/de/b/foobar-german"/>
    <xhtml:link rel="alternate" hreflang="en" href="http://localhost/en/b/foobar-english"/>
  </url>
</urlset>

Otherwise these sites might not get indexed by search engines.

groe avatar Dec 22 '15 17:12 groe

You're right. That output is incorrect. I'll need to review when I get some time.

Thanks for reporting and following up.

On 22 Dec 2015, at 5:17 pm, Benjamin Grössing [email protected] wrote:

Yes, it outputs multiple xhtml:link-tags within the same loc-tag. For instance, this is my output for 1 content page in 2 languages:

http://localhost/de/b/foobar-german 2015-12-22T17:08:50+00:00 weekly 0.5 But according to this Google Search Console Help Article, this is not enough.

In addition to the xhtml:link-tags every localized version of a page must have its own url/loc-tag as well.

In the example above, the correct output should therefore be:

http://localhost/de/b/foobar-german 2015-12-22T17:08:50+00:00 weekly 0.5 http://localhost/en/b/foobar-english 2015-12-22T17:08:50+00:00 weekly 0.5 Otherwise these sites might not get indexed by search engines.

— Reply to this email directly or view it on GitHub.

joshuabaker avatar Dec 22 '15 17:12 joshuabaker

Thanks a lot, @joshuabaker, that would be awesome!

Let me know if I can help.

groe avatar Dec 22 '15 17:12 groe

@joshuabaker This fixes it (also shouldn't break any sites that use multiple domains): #18.

Would love to see that merged in :)

groe avatar Jan 29 '16 16:01 groe

@groe I see that you have a updated fork. However, I don't understand the pull request from @dommmel?

If CRAFT_LOCALE is set, it still has the old behaviour, which leads to invalid sitemap? Shouldn't we get rid of that check and include them regardless?

sjelfull avatar Sep 06 '17 15:09 sjelfull

@sjelfull Why would it lead to an invalid sitemap?

groe avatar Sep 06 '17 17:09 groe

@groe As you mentioned at the start:

So unless I am missing something, the sitemap generated by the plugin does contain all pages from one locale, and their references to other locales but not the other locales themselves.

This happens if CRAFT_LOCALE is set, and you always(?) set CRAFT_LOCALE on a multi-locale site.

That is why the conditional that checks for CRAFT_LOCALE doesn't make sense to me.

sjelfull avatar Sep 06 '17 20:09 sjelfull

@sjelfull Depending on how your site is structured the /sitemap.xml endpoint can be called without CRAFT_LOCALE being set. For instance, if you have a single domain for all languages you could have a URL structure like:

  • example.org/en/... – CRAFT_LOCALE = en
  • example.org/de/... – CRAFT_LOCALE = de
  • example.org/sitemap.xml – CRAFT_LOCALE is not set

Whereas if you have multiple domains you could set it up like:

  • example.us/... – CRAFT_LOCALE = en
  • example.us/sitemap.xml – CRAFT_LOCALE = en
  • example.de/... – CRAFT_LOCALE = de
  • example.de/sitemap.xml – CRAFT_LOCALE = de

... which would have each domain's sitemap only include the links for the specific locale.

groe avatar Sep 06 '17 22:09 groe

Formatting is incorrect when there is more than one locale. Any update on that?

nlussier-globalia avatar Sep 22 '17 14:09 nlussier-globalia

@nlussier-globalia What exactly do you mean?

groe avatar Sep 22 '17 16:09 groe

The browser is unable to interpret it, so all you see is the plain output (you don't see the tree).

nlussier-globalia avatar Sep 22 '17 17:09 nlussier-globalia

View source?

joshuabaker avatar Sep 22 '17 17:09 joshuabaker

The view source seems correct though (identical to another site that has only one language).

nlussier-globalia avatar Sep 22 '17 17:09 nlussier-globalia