zola icon indicating copy to clipboard operation
zola copied to clipboard

Inconsistent date formats in sitemap is non parsable by google

Open 0xcrypto opened this issue 1 year ago • 4 comments

Bug Report

Zola depends on user to provide correct date format for the <lastmod> in sitemap which can result in inconsistent date formats. If there are different date formats for lastmod in sitemap.xml, google search console won't accept the sitemap. image While a user can use correct date format manually to fix it, it gets cumbersome when there are many writers and posts.

Environment

Zola version: 0.17.1 (Github pages) and zola 0.17.2 (local dev)

Expected Behavior

Sitemap should use same date format irrespective of what user has provided.

Current Behavior

Zola is accepting the frontmatter variable date as provided by the user. I usually mention a date with time ie. 2023-08-20 14:36:26 but sometimes, I use 2023-08-20 only. This is resulting in two different date formats in the sitemap. An excerpt from my current sitemap:

<url>
<loc>
https://eval.blog/research/laravel-deserialization-gadget-chain/
</loc>
<lastmod>2021-06-12</lastmod>
</url>
<url>
<loc>
https://eval.blog/research/microsoft-account-token-leaks-in-harvest/
</loc>
<lastmod>2023-10-21T22:33:40</lastmod>
</url>

Step to reproduce

  • Publish two posts having date in frontmatter. One with date and time format (YYYY-MM-DD HH:mm:ss), second with date only (YYYY-MM-DD)
  • Check date format in <lastmod> tag of the sitemap (http://127.0.0.1:1111/sitemap.xml)

0xcrypto avatar Oct 22 '23 12:10 0xcrypto

Weird. I'm not sure whether to ignore the time or add a random one if not set, both are valid options.

Keats avatar Oct 22 '23 14:10 Keats

@Keats

Found a way. By using a custom sitemap and date filter of tera, I was able to add a date format of my choice.

The template I am using is:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    {%- for sitemap_entry in entries %}
    <url>
        <loc>{{ sitemap_entry.permalink | escape_xml | safe }}</loc>
        {%- if sitemap_entry.updated %}
        <lastmod>{{ sitemap_entry.updated | date(format="%+", timezone="Asia/Kolkata")  }}</lastmod>
        {%- endif %}
    </url>
    {%- endfor %}
</urlset>

I think a simple solution would be just passing sitemap_entry.updated to date filter in the builtin sitemap.xml.

There is one issue in date filter though. Timezone argument does not work and gives +00:00 as for timezone.

date(format="%+", timezone="Asia/Kolkata")

0xcrypto avatar Oct 22 '23 15:10 0xcrypto

We should use the date filter in the default template to avoid that edgecase. As for timezone, I'm not sure

Keats avatar Oct 23 '23 18:10 Keats

We should use the date filter in the default template to avoid that edgecase. As for timezone, I'm not sure

Found the problem with timezone. The date converted to the timestamp must mention a timezone or else it would be default to +00:00. This kinda obsoletes the need of timezone argument but anyways, here is my updated sitemap with correct timezone.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    {%- for sitemap_entry in entries %}
    <url>
        <loc>{{ sitemap_entry.permalink | escape_xml | safe }}</loc>
        {%- if sitemap_entry.updated %}
        <lastmod>{{ sitemap_entry.updated | date(format="%Y-%m-%dT%T+05:30", timezone="Asia/Kolkata") }}</lastmod>
        {%- endif %}
    </url>
    {%- endfor %}
</urlset>

0xcrypto avatar Oct 26 '23 16:10 0xcrypto