next-sitemap
next-sitemap copied to clipboard
Duplicate sitemaps
Describe the bug Sitemap indexes show in both robots.txt and in the root sitemap index
To Reproduce With this config:
/** @type {import('next-sitemap').IConfig} */
module.exports = {
siteUrl: 'https://embarc.com',
changefreq: 'daily',
priority: 0.7,
sitemapSize: 2000,
generateRobotsTxt: true,
autoLastmod: true,
exclude: [
'*/sitemap.xml',
'/dashboard/*',
'/pricing',
'/signin',
'/legal/*',
],
robotsTxtOptions: {
includeNonIndexSitemaps: false,
additionalSitemaps: [
'https://embarc.com/capital/leadership/sitemap.xml',
'https://embarc.com/capital/spac/sitemap.xml',
'https://embarc.com/capital/spac-sponsor/sitemap.xml',
'https://embarc.com/company/crowdfunding/sitemap.xml',
'https://embarc.com/portal/crowdfunding/sitemap.xml',
'https://embarc.com/capital/underwriter/sitemap.xml',
],
},
};
Expected behavior Not duplicate sitemaps
Example
See https://embarc.com/robots.txt:
# *
User-agent: *
Allow: /
# Host
Host: https://embarc.com
# Sitemaps
Sitemap: https://embarc.com/sitemap.xml
Sitemap: https://embarc.com/capital/leadership/sitemap.xml
Sitemap: https://embarc.com/capital/spac/sitemap.xml
Sitemap: https://embarc.com/capital/spac-sponsor/sitemap.xml
Sitemap: https://embarc.com/company/crowdfunding/sitemap.xml
Sitemap: https://embarc.com/portal/crowdfunding/sitemap.xml
Sitemap: https://embarc.com/capital/underwriter/sitemap.xml
And see https://embarc.com/sitemap.xml :
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://embarc.com/sitemap-0.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/capital/leadership/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/capital/spac/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/capital/spac-sponsor/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/company/crowdfunding/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/portal/crowdfunding/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/capital/underwriter/sitemap.xml</loc>
</sitemap>
</sitemapindex>
My preference would be to have only in the sitemap index and not in the robots.txt. How can that be done?
Closing this issue due to inactivity.
would rather not
I can confirm this, and looking at the code the exclude list is not run over the sitemaps added to the robotsTxtOptions.additionalSitemaps
, they are just plainly added, even tho the documentation does state that this is possible, at the moment it is not.
If you add sitemap indexes to the list above, this will cause these index to be added to the main sitemap and this is something not allowed by google:
Incorrect sitemap index format: Nested sitemap indexes One or more entries in your sitemap index file uses its own URL or the URL of another sitemap index file. A sitemap index file can't list other sitemap index files, only sitemap files.
Remove any entries pointing to sitemap index files, then resubmit your sitemap.
https://support.google.com/webmasters/answer/7451001#errors&zippy=%2Csitemap-parsing-errors
This package is not actively maintained. It auto-closes issues and PRs after a particular set time
Closing this issue due to inactivity.
I’ll make a PR soon
Closing this issue due to inactivity.