sitemapper icon indicating copy to clipboard operation
sitemapper copied to clipboard

Support robots.txt Sitemaps (plural!) discovery

Open Abdull opened this issue 3 years ago • 1 comments

The robots.txt standard allows for declaring the location of sitemaps (plural!), e.g. for https://www.nytimes.com/robots.txt :

# ....
User-Agent: omgili
Disallow: /

User-agent: ia_archiver
Disallow: /

Sitemap: https://www.nytimes.com/sitemaps/new/news.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/sitemap.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/collections.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/video.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/cooking.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/recipe-collects.xml.gz
Sitemap: https://www.nytimes.com/sitemaps/new/regions.xml
Sitemap: https://www.nytimes.com/sitemaps/new/best-sellers.xml
Sitemap: https://www.nytimes.com/sitemaps/www.nytimes.com/2016_election_sitemap.xml.gz
Sitemap: https://www.nytimes.com/elections/2018/sitemap
Sitemap: https://www.nytimes.com/wirecutter/sitemapindex.xml

It would be great if sitemapper allowed to process URLs to robots.txt in order to transiently return all Sitemap URLs.

Abdull avatar Oct 10 '22 16:10 Abdull