polyglot icon indicating copy to clipboard operation
polyglot copied to clipboard

Feature request: either work with an existing jekyll sitemap plugin or generate sitemap

Open scgupta opened this issue 8 years ago • 8 comments

Currently I am using jekyll-sitemap for generating sitemap, see example. And polyglot helps generating sitemap.xml for default and other languages in their respective lang dirs. But the links in all generated sitemap.xml files are for default languages. So either there is some setting that I don't understand something and doing wrong, or polyglot currently doesn't target generating sitemap.xml

Ideally, I wish there was sitemap as explained in this google webmaster tip instead of multiple sitemap.xml (as it is not for visitor, but crawl bot). But even if there are separate sitemap.xml being generated, I wish somehow links were correct.

I see two possibilities to achieve it:

  1. polyglot works with a common jekyll sitemap plugin
  2. generates sitemap itself (I am not sure if feasible, but considering it probably knows about all pages being generated and cares about I18n_Headers etc., it might be something to think about).

Thanks @untra for such wonderful support and help that I got from you for the two issues I faced. +satish

scgupta avatar Jul 23 '16 03:07 scgupta

I would like to see correct sitemap.xml too.

aensidhe avatar Apr 09 '17 17:04 aensidhe

Me too

lukaszolek avatar Oct 02 '18 13:10 lukaszolek

I'm pretty sure polyglot is running after jekyll-sitemaps, and is copying the sitemaps.xml file to the other language folder roots without any processing as if it was any other file.

As this has been a floating issue since 2016, I'm going to resolve it in my project by removing the sitemaps plugin & building a sitemap file using polyglot vars. I'll post my example once finished (maybe we could put in a guide or in the example documents).

If you've already created anything which could give me a head start feel free to share.

jerturowetz avatar Nov 06 '18 13:11 jerturowetz

Anything get resolved with this? @jerturowetz interested to see what you have created.

MPJHorner avatar Nov 11 '18 23:11 MPJHorner

EDIT: Wrapped html comments in liquid comment syntax {% comment %} to avoid messy code

-- @MPJHorner @scgupta check it out!

Just wrapped it up now! I've ditched using a sitemap plugin and just built the sitemap manually.

There's a few items to note:

  • For cleanliness, sitemap.xml is listed in the exclude_from_localization array in _config.yml
  • I did not specify any hreflang attributes in the sitemap as my posts/pages have hreflang specified in their <head>. I do it manually, but polyglot includes {{ I18n_Headers }} which built the appropriate tags for you.
  • You have to include some empty yaml at the top of sitemap.xml in order to get jekyll to process the file

Here's the contents of my sitemap.xml which is located in the root of my source folder:

---
layout:
---

{% comment %}<!-- I am using hreflang attributes on a page-by-page basis so no need to include them per url here -->{% endcomment %}
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
{% for lang in site.languages %}

    {% comment %}<!-- It would be better to use the where_exp filter in the first loop but I dont think the unless expression is supported -->{% endcomment %}
    {% for node in site.pages %}
        {% comment %}<!-- very lazy check to see if page is in the exclude list - this means excluded pages are not gonna be in the sitemap at all, write exceptions as necessary -->{% endcomment %}
        {% unless site.exclude_from_localization contains node.path %}
            {% comment %}<!-- I am assuming if there's not layout assigned, then not include the page in the sitemap, you may want to change this -->{% endcomment %}
            {% if node.layout %}
                <url>
                    <loc>{% if lang == site.default_lang %}{{ node.url | absolute_url }}{% else %}{{ node.url | prepend: lang | prepend: '/' | absolute_url }}{% endif %}</loc>
                </url>
            {% endif %}
        {% endunless %}
    {% endfor %}

    {% comment %}<!-- This loops through all site collections including posts -->{% endcomment %}
    {% for collection in site.collections %}
        {% for node in site[collection.label] %}
            <url>
                <loc>{% if lang == site.default_lang %}{{ node.url | absolute_url }}{% else %}{{ node.url | prepend: lang | prepend: '/' | absolute_url }}{% endif %}</loc>
            </url>
        {% endfor %}
    {% endfor %}

{% endfor %}
</urlset>

jerturowetz avatar Nov 12 '18 15:11 jerturowetz

@jerturowetz looks ideal. You should put this on the Readme.md

MPJHorner avatar Jan 31 '19 16:01 MPJHorner

@jerturowetz, the proposed method works great. However, as the plugin jekyll-sitemap is removed, no robots.txt is generated anymore.

hacketiwack avatar Feb 26 '23 16:02 hacketiwack

In my custom sitemap similar to the one above I am having hard time with excluding from the sitemap document nodes without translation, that is pages of the documents that are rendered as untranslated fallback pages. It might be technically not wrong to have them listed, but since sitemap is used to index web sites, it is not really useful to index fallback pages without actual translation, since they would be just duplicates of the originals and the metadata stating the language would be incorrect. It's better not to index them at all and exclude them from sitemap, despite the fact that they show up on the web resulting from the fallback mechanism. This would be also the most coherent way to do it, if sitemap indexes referring to language specific sitemaps are used.

Currently only decent option to solve this is to create placeholder pages for documents without translation with the warning "this document is not translated yet, please refer to the original document". This is a viable option and with a special template one could include the original or default language page with a separate language tag in HTML, but it seems to overcomplicate the website presentation and not listing those dummy fallback pages would still make sense.

I think having a variable to list available translations suggested here might be useful for solving these kinds of issues. Or is there another way to solve this? I think similar problems appear also when rendering menus and site archives, also in language switcher, where one might want to indicate if there is an actual translation available.

boamaod avatar Feb 25 '24 07:02 boamaod

https://github.com/untra/polyglot?tab=readme-ov-file#sitemap-generation Thank you @jerturowetz for your solution! added to the readme

untra avatar Mar 18 '24 22:03 untra