hyrax
hyrax copied to clipboard
Sitemaps should conform to indexers' sitemap expectations
Descriptive summary
When I submit a sitemap at /resourcelist to Google, I should not get xml errors
Steps to reproduce the behavior in User Interface (UI)
- For a host with Google Search Console enabled, submit a sitemap at https://search.google.com/search-console/sitemaps?resource_id=sc-domain%3AYOUR_DOMAIN.whatever
- See the status - it should not have any xml parsing errors
Actual behavior (include screenshots if available)
I have seen this on applications on the 5.0-flexible branch, but I suspect it's true on main as well.
Actual behavior is 2 invalid xml tag errors
Acceptance Criteria/Expected Behavior
- [ ] Should conform to http://www.sitemaps.org/schemas/sitemap/0.9
- [ ] Should not raise xml errors when submitted to Google Search Console
- [ ] If the first two points are mutually exclusive, choose one and document it in the relevant code (most likely the ResourceListWriter)
Rationale (for feature request only)
Good sitemaps will discourage crawling-by-facet, which is a major stress on Solr, and should increase discoverability for everyone using Hyrax.
Related work
On Hyku - https://github.com/samvera/hyku/issues/2765 On Hyrax - has never been implemented, but probably should be - https://github.com/samvera/hyrax/issues/59