PageQuerySet.in_site and Sitemap do not return pages in other languages
Issue Summary
When Wagtail is configured for multi-language content, Page.objects.in_site(site) returns only pages in the language defined as the Site.root_page.
Similarly, the Sitemap, which does not use the in_site queryset method, also does not include translated pages.
I had thought that perhaps the language root page should be a descendant of the root page, but it seems this is not the case, as per the docs:
Wagtail stores content in a separate page tree for each locale. For example, if you have two sites in two locales, then you will see four homepages at the top level of the page hierarchy in the explorer.
Steps to Reproduce
- In multi-language site shell run:
from wagtail.models import Site, Page
site = Site.objects.select_related("root_page").get(is_default_site=True)
qs = Page.objects.in_site(site)
qswould contain only pages in single language.
Technical details
- Python version: Run
python --version.
Python 3.12.6
- Django version: Look in your requirements.txt, or run
pip show django | grep Version.
Django==4.2.15
- Wagtail version: Look at the bottom of the Settings menu in the Wagtail admin, or run
pip show wagtail | grep Version:.
wagtail==6.2
- Browser version: You can use https://www.whatsmybrowser.org/ to find this out.
Not browser related.
Working on this
Updating the in_site method below would handle multi-lingual pages as well. The Sitemap could also be updated to use in_site.
I can create a PR.
def in_site(self, site):
"""
This filters the QuerySet to only contain pages within the specified site.
"""
from functools import reduce
root_page_and_translations = site.root_page.get_translations(inclusive=True)
all_descendants_q = reduce(
lambda x, y: x | y, [self.descendant_of_q(p, inclusive=True) for p in root_page_and_translations ]
)
return self.filter(all_descendants_q)
https://www.mashandgravy.co.uk/blog/google-friendly-sitemaps-multilingual-wagtail-sites/ may be of use here.
Btw, this is not a bug as translations are created in their corresponding locale trees, so your Site query only picks up the default site pages (ie. your source language)
@zerolab Thanks for the quick response and the useful link. The Girls Not Brides website mentioned in the article is a bit specific because it has separate website for each language.
After reading both the article and the Google documentation, I am confident that the sitemap should include all pages that are part of the same site.
For example, let's say there is only one page in two languages on a website:
- https://www.example.com/en/
- https://www.example.com/de/
The sitemap should contain both pages (currently it does not):
<url><loc>https://www.example.com/en/</loc>
<url><loc>https://www.example.com/de</loc>
If the sitemap includes information about localized pages, it should still have a
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>http://www.example.com/en/</loc>
<xhtml:link
rel="alternate"
hreflang="de"
href="http://www.example.com/de/"/>
<xhtml:link
rel="alternate"
hreflang="en"
href="http://www.example.com/en/"/>
</url>
<url>
<loc>http://www.example.com/de/</loc>
<xhtml:link
rel="alternate"
hreflang="de"
href="http://www.example.com/de/"/>
<xhtml:link
rel="alternate"
hreflang="en"
href="http://www.example.com/en/"/>
</url>
This aligns with the Google documentation:
https://developers.google.com/search/docs/specialty/international/localized-versions?hl=en&visit_id=637509921562028966-2232430485&rd=2#sitemap
Flagging as a documentation improvement for now.