python-sitemap icon indicating copy to clipboard operation
python-sitemap copied to clipboard

Exclude canonicalized pages

Open Spidle opened this issue 4 years ago • 0 comments

sometimes we have URLs that are canonicalized to other pages, and these should not be included in the sitemap. See google's reference: https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap

So the logic would be to look for a canonical tag and check if it matches the crawled URL. If it does not, then do not include that page in the sitemap.

I'm working on updating your code myself to include this but I'm still new to Python.

Spidle avatar Oct 22 '21 23:10 Spidle