Ghost icon indicating copy to clipboard operation
Ghost copied to clipboard

Empty sitemap if a taxonomy is removed

Open benjaminrancourt opened this issue 3 years ago • 2 comments

Issue Summary

By deleting a taxonomy from the routes.yml file, the deleted taxonomies pages are not generated, but their respective sitemap are. They are empty, but some tools like Google Search Console warn users that their sitemap is wrong because it's empty.

Example of my website, where I've disabled the authors taxonomy.

<tr>
  <td>
    <a href="https://www.benjaminrancourt.ca/sitemap-authors.xml">
      https://www.benjaminrancourt.ca/sitemap-authors.xml
   </a>
 </td>
  <td>1970-01-01 00:00</td>
</tr>

image

The sitemap can be read, but it contains errors | Empty sitemap

image


I tried to find a way to fix this in Ghost source code, but I couldn't run a working setup on my machine (I really should install Linux... 🙈).

However, I'll paste my notes I've took while looking into this bug, in the hope that it will at least help you.

core/frontend/services/sitemap/manager.js

    // [Ben] The index sitemap is generated here
    getIndexXml() {
        return this.index.getXml();
    }

    createIndexGenerator(options) {
        // [Ben] Solution 1: If some taxonomies are disabled, can we remove them from the options below?
        return new IndexMapGenerator({
            types: {
                pages: this.pages,
                posts: this.posts,
                authors: this.authors,
                tags: this.tags
            },
            maxPerPage: options.maxPerPage
        });
    }

core/frontend/services/sitemap/index-generator.js

    generateSiteMapUrlElements() {
        // [Ben] We iterate over each resource type here
        return _.map(this.types, (resourceType) => {
            // `|| 1` = even if there are no items we still have an empty sitemap file
            const noOfPages = Math.ceil(Object.keys(resourceType.nodeLookup).length / this.maxPerPage) || 1;
            const pages = [];

            for (let i = 0; i < noOfPages; i++) {
                const page = i === 0 ? '' : `-${i + 1}`;
                const url = urlUtils.urlFor({relativeUrl: '/sitemap-' + resourceType.name + page + '.xml'}, true);
                const lastModified = resourceType.lastModified;

                // [Ben] Solution 2: For disabled taxonomies, I suspect their lastModified property is undefined.
                // Therefore, maybe we could not push this resource sitemap if it's the case?

                pages.push({
                    sitemap: [
                        {loc: url},
                        {lastmod: moment(lastModified).toISOString()}
                    ]
                });
            }

            return pages;
        }).flat();
    }

Thanks to the Ghost team!

Steps to Reproduce

  1. Disable a taxonomy like tags or authors (https://ghost.org/docs/themes/routing/#removing-taxonomies)
  2. Go to /sitemap.xml
  3. The disabled taxonomies will have an empty sitemap with a Last Modified date of 1970-01-01 00:00

Ghost Version

5.2.3

Node.js Version

v16.15.0

How did you install Ghost?

Docker

Database type

MySQL 8

Browser & OS version

Google Chrome | Windows 10

Relevant log / error output

No response

Code of Conduct

  • [X] I agree to be friendly and polite to people in this repository

benjaminrancourt avatar Jun 18 '22 19:06 benjaminrancourt

This issue is currently awaiting triage from @ErisDS. We're having a busy time right now, but we'll update this issue ASAP. If you have any more information to help us triage faster please leave us some comments. Thank you for understanding 🙂

github-actions[bot] avatar Jul 16 '22 20:07 github-actions[bot]

Hey there, thank you so much for the detailed bug report.

That does look like something that shouldn't happen! A PR to fix this issue would be very welcome 🙂

ErisDS avatar Jul 26 '22 14:07 ErisDS

@ErisDS I'm submitting a PR for this one however

  1. Generating a taxonomy entry in the index sitemap even when it doesn't have any urls was done on purpose in this PR https://github.com/TryGhost/Ghost/pull/13698/files#diff-47edb0d155714257c72a2993b7480f660de663942214ed29002e4715ffcd1e2eR34
  2. Generating an empty sitemap for that taxonomy was also done on purpose in the same PR https://github.com/TryGhost/Ghost/pull/13698/files#diff-67aa668df4b26f67dd0cd14fa0f956472eca6b5dc6944566675b2f4ebdd71473R42 . But this implementation results in an invalid xml file (empty content)

I would think google reports an error because the file is not a valid xml, rather than because it has no entry. But I would guess other SEO tools might report at least a warning for an empty sitemap. Moreover I do not really see the point of generating an empty sitemap. So, unless there was a proper reason for empty (but valid) sitemaps, I'm going for no entry in the index sitemap and 404 on the taxonomy sitemap.

jbenezech avatar Oct 08 '22 05:10 jbenezech