hyrax icon indicating copy to clipboard operation
hyrax copied to clipboard

Sitemaps should conform to indexers' sitemap expectations

Open maxkadel opened this issue 2 months ago • 0 comments

Descriptive summary

When I submit a sitemap at /resourcelist to Google, I should not get xml errors

Steps to reproduce the behavior in User Interface (UI)

  1. For a host with Google Search Console enabled, submit a sitemap at https://search.google.com/search-console/sitemaps?resource_id=sc-domain%3AYOUR_DOMAIN.whatever
  2. See the status - it should not have any xml parsing errors

Actual behavior (include screenshots if available)

I have seen this on applications on the 5.0-flexible branch, but I suspect it's true on main as well.

Actual behavior is 2 invalid xml tag errors

Image

Acceptance Criteria/Expected Behavior

  • [ ] Should conform to http://www.sitemaps.org/schemas/sitemap/0.9
  • [ ] Should not raise xml errors when submitted to Google Search Console
  • [ ] If the first two points are mutually exclusive, choose one and document it in the relevant code (most likely the ResourceListWriter)

Rationale (for feature request only)

Good sitemaps will discourage crawling-by-facet, which is a major stress on Solr, and should increase discoverability for everyone using Hyrax.

Related work

On Hyku - https://github.com/samvera/hyku/issues/2765 On Hyrax - has never been implemented, but probably should be - https://github.com/samvera/hyrax/issues/59

maxkadel avatar Nov 05 '25 15:11 maxkadel