next-sitemap icon indicating copy to clipboard operation
next-sitemap copied to clipboard

Duplicate sitemaps

Open sroussey opened this issue 1 year ago • 8 comments

Describe the bug Sitemap indexes show in both robots.txt and in the root sitemap index

To Reproduce With this config:

/** @type {import('next-sitemap').IConfig} */
module.exports = {
  siteUrl: 'https://embarc.com',
  changefreq: 'daily',
  priority: 0.7,
  sitemapSize: 2000,
  generateRobotsTxt: true,
  autoLastmod: true,
  exclude: [
    '*/sitemap.xml',
    '/dashboard/*',
    '/pricing',
    '/signin',
    '/legal/*',
  ],
  robotsTxtOptions: {
    includeNonIndexSitemaps: false,
    additionalSitemaps: [
      'https://embarc.com/capital/leadership/sitemap.xml',
      'https://embarc.com/capital/spac/sitemap.xml',
      'https://embarc.com/capital/spac-sponsor/sitemap.xml',
      'https://embarc.com/company/crowdfunding/sitemap.xml',
      'https://embarc.com/portal/crowdfunding/sitemap.xml',
      'https://embarc.com/capital/underwriter/sitemap.xml',
    ],
  },
};

Expected behavior Not duplicate sitemaps

Example

See https://embarc.com/robots.txt:

# *
User-agent: *
Allow: /

# Host
Host: https://embarc.com

# Sitemaps
Sitemap: https://embarc.com/sitemap.xml
Sitemap: https://embarc.com/capital/leadership/sitemap.xml
Sitemap: https://embarc.com/capital/spac/sitemap.xml
Sitemap: https://embarc.com/capital/spac-sponsor/sitemap.xml
Sitemap: https://embarc.com/company/crowdfunding/sitemap.xml
Sitemap: https://embarc.com/portal/crowdfunding/sitemap.xml
Sitemap: https://embarc.com/capital/underwriter/sitemap.xml

And see https://embarc.com/sitemap.xml :


<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://embarc.com/sitemap-0.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/capital/leadership/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/capital/spac/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/capital/spac-sponsor/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/company/crowdfunding/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/portal/crowdfunding/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://embarc.com/capital/underwriter/sitemap.xml</loc>
</sitemap>
</sitemapindex>

My preference would be to have only in the sitemap index and not in the robots.txt. How can that be done?

sroussey avatar Nov 20 '23 16:11 sroussey

Closing this issue due to inactivity.

github-actions[bot] avatar Jan 20 '24 04:01 github-actions[bot]

would rather not

sroussey avatar Jan 20 '24 04:01 sroussey

I can confirm this, and looking at the code the exclude list is not run over the sitemaps added to the robotsTxtOptions.additionalSitemaps, they are just plainly added, even tho the documentation does state that this is possible, at the moment it is not. If you add sitemap indexes to the list above, this will cause these index to be added to the main sitemap and this is something not allowed by google:

Incorrect sitemap index format: Nested sitemap indexes One or more entries in your sitemap index file uses its own URL or the URL of another sitemap index file. A sitemap index file can't list other sitemap index files, only sitemap files.

Remove any entries pointing to sitemap index files, then resubmit your sitemap.

https://support.google.com/webmasters/answer/7451001#errors&zippy=%2Csitemap-parsing-errors

peti446 avatar Feb 28 '24 12:02 peti446

This package is not actively maintained. It auto-closes issues and PRs after a particular set time

kevinrobert3 avatar Apr 22 '24 11:04 kevinrobert3

Closing this issue due to inactivity.

github-actions[bot] avatar Jun 22 '24 04:06 github-actions[bot]

I’ll make a PR soon

sroussey avatar Jun 22 '24 05:06 sroussey

Closing this issue due to inactivity.

github-actions[bot] avatar Aug 22 '24 04:08 github-actions[bot]