botasaurus
botasaurus copied to clipboard
Sitemap scraping not taking into account sitemap_index.xml
I noticed that the Sitemap function doesn't take into account urls where the sitemap lies in at /sitemap_index.html. A page like denydesigns.com works because it's like sitemap is at /sitemap.xml but a page like https://www.nomadicmatt.com/ doesn't because it sitemap lies at /sitemap_index.html. I suppose a workaround would be to input different variations for the sitemap and check if it exists but it would be nice to have the scraper have this functionality by default, let me know your thoughts. Thanks.