botasaurus icon indicating copy to clipboard operation
botasaurus copied to clipboard

Sitemap scraping not taking into account sitemap_index.xml

Open gbopola opened this issue 5 months ago • 0 comments

I noticed that the Sitemap function doesn't take into account urls where the sitemap lies in at /sitemap_index.html. A page like denydesigns.com works because it's like sitemap is at /sitemap.xml but a page like https://www.nomadicmatt.com/ doesn't because it sitemap lies at /sitemap_index.html. I suppose a workaround would be to input different variations for the sitemap and check if it exists but it would be nice to have the scraper have this functionality by default, let me know your thoughts. Thanks.

gbopola avatar Jul 07 '25 07:07 gbopola