zola
zola copied to clipboard
Inconsistent date formats in sitemap is non parsable by google
Bug Report
Zola depends on user to provide correct date format for the <lastmod>
in sitemap which can result in inconsistent date formats. If there are different date formats for lastmod
in sitemap.xml, google search console won't accept the sitemap.
While a user can use correct date format manually to fix it, it gets cumbersome when there are many writers and posts.
Environment
Zola version: 0.17.1 (Github pages) and zola 0.17.2 (local dev)
Expected Behavior
Sitemap should use same date format irrespective of what user has provided.
Current Behavior
Zola is accepting the frontmatter variable date as provided by the user. I usually mention a date with time ie. 2023-08-20 14:36:26
but sometimes, I use 2023-08-20 only. This is resulting in two different date formats in the sitemap. An excerpt from my current sitemap:
<url>
<loc>
https://eval.blog/research/laravel-deserialization-gadget-chain/
</loc>
<lastmod>2021-06-12</lastmod>
</url>
<url>
<loc>
https://eval.blog/research/microsoft-account-token-leaks-in-harvest/
</loc>
<lastmod>2023-10-21T22:33:40</lastmod>
</url>
Step to reproduce
- Publish two posts having
date
in frontmatter. One with date and time format (YYYY-MM-DD HH:mm:ss), second with date only (YYYY-MM-DD) - Check date format in
<lastmod>
tag of the sitemap (http://127.0.0.1:1111/sitemap.xml)
Weird. I'm not sure whether to ignore the time or add a random one if not set, both are valid options.
@Keats
Found a way. By using a custom sitemap and date filter of tera, I was able to add a date format of my choice.
The template I am using is:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
{%- for sitemap_entry in entries %}
<url>
<loc>{{ sitemap_entry.permalink | escape_xml | safe }}</loc>
{%- if sitemap_entry.updated %}
<lastmod>{{ sitemap_entry.updated | date(format="%+", timezone="Asia/Kolkata") }}</lastmod>
{%- endif %}
</url>
{%- endfor %}
</urlset>
I think a simple solution would be just passing sitemap_entry.updated
to date
filter in the builtin sitemap.xml.
There is one issue in date filter though. Timezone argument does not work and gives +00:00 as for timezone.
date(format="%+", timezone="Asia/Kolkata")
We should use the date filter in the default template to avoid that edgecase. As for timezone, I'm not sure
We should use the date filter in the default template to avoid that edgecase. As for timezone, I'm not sure
Found the problem with timezone. The date converted to the timestamp must mention a timezone or else it would be default to +00:00. This kinda obsoletes the need of timezone
argument but anyways, here is my updated sitemap with correct timezone.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
{%- for sitemap_entry in entries %}
<url>
<loc>{{ sitemap_entry.permalink | escape_xml | safe }}</loc>
{%- if sitemap_entry.updated %}
<lastmod>{{ sitemap_entry.updated | date(format="%Y-%m-%dT%T+05:30", timezone="Asia/Kolkata") }}</lastmod>
{%- endif %}
</url>
{%- endfor %}
</urlset>