sphinx-sitemap icon indicating copy to clipboard operation
sphinx-sitemap copied to clipboard

Generate the <lastmod> tag

Open jdillard opened this issue 6 years ago • 4 comments

There currently is no <lastmod> tag for each URL in the sitemap.xml. To implement this will likely require a different extension.

example: https://bitbucket.org/dhellmann/pymotw-3/src/17b6ea3b657b93ad45b6ccd5c295e767f4f4be71/source/conf.py?at=master&fileviewer=file-view-default#conf.py-449

def html_page_context(app, pagename, templatename, context, doctree):
    # Use the last modified date from git instead of applying a single
    # value to the entire site.
    context['last_updated'] = _get_last_updated(app, pagename)
def _get_last_updated(app, pagename):
    # Use the last modified date from git instead of applying a single
    # value to the entire site.
    last_updated = None
    src_file = app.builder.env.doc2path(pagename)
    if os.path.exists(src_file):
        try:
            last_updated_t = subprocess.check_output(
                [
                    'git', 'log', '-n1', '--format=%ad', '--date=short',
                    '--', src_file,
                ]
            ).decode('utf-8').strip()
            last_updated = datetime.datetime.strptime(last_updated_t,
                                                      '%Y-%m-%d')
        except (ValueError, subprocess.CalledProcessError):
            pass
    return last_updated

caveats to consider:

  1. Included files may have a later updated date than the parent page.
  2. Files included for substitution purposes won't take into account if only a substitution on that page changed, making it hard to determine if the change date on the included files are accurate for that page.

jdillard avatar Aug 18 '17 05:08 jdillard

This is how jekyll does it: https://github.com/jekyll/jekyll-sitemap#lastmod-tag

jdillard avatar Jan 21 '18 16:01 jdillard

I happen to have recently created a Sphinx extension that does basically just that: https://github.com/mgeier/sphinx-last-updated-by-git

It's somewhat similar to your own https://github.com/jdillard/sphinx-gitstamp.

Included files may have a later updated date than the parent page.

That's taken care of.

Files included for substitution purposes won't take into account if only a substitution on that page changed, making it hard to determine if the change date on the included files are accurate for that page.

I guess that's still a problem.

Another question is: what happens with auto-generated source files?

Note that index.html will only be updated when index.rst changes, not when any of the section titles happen to change. I'm not sure whether that's a problem.

For more caveats see the README.

mgeier avatar Apr 26 '20 10:04 mgeier

I've been following along, nice work! I plan on incorporating the relevant bits into my extensions as a learning exercise (part of the reason I made these is to help me get better at Python).

I guess that's still a problem.

As a sub-point to the includes, I need to test if rst_prolog reacts differently, although I suspect it doesn't. This is how I typically include "global" substitution files:

rst_prolog = """
.. include:: /substitutions.rsti
"""

Another question is: what happens with auto-generated source files?

Note that index.html will only be updated when index.rst changes, not when any of the section titles happen to change. I'm not sure whether that's a problem.

That's a good point as well. While I understand the importance of the accuracy, I wonder if one could argue it is outside the scope of "git timestamps". Ideally, that type of change would be reflected though.

jdillard avatar Apr 27 '20 17:04 jdillard

This is an interesting case! I've just checked, and the included file (in your example substitutions.rsti) is listed as dependency for each page.

When you update your substitutions.rsti file, this will be reflected in the dates of each page (if you check for dependencies by default, as I do).

This is what you want, right?

There will always be cases where this is strictly speaking wrong (e.g. if you update your substitutions but a certain page doesn't actually use those changes), but Sphinx simply doesn't provide such fine-grained information about dependencies. But it's good to be aware of the behavior.

I wonder if one could argue it is outside the scope of "git timestamps". Ideally, that type of change would be reflected though.

For now I'm only interested in using the information about dependencies as provided by Sphinx and consider everything else out of scope.

Now that I think about it, one could probably use the source information stored in the "doctree" to have more fine-grained (and/or additional) information. But I'm not sure that would work at all, and I don't want to go down this path right now.

mgeier avatar Apr 28 '20 12:04 mgeier