bioconductor.org icon indicating copy to clipboard operation
bioconductor.org copied to clipboard

broken sitemap.xml

Open sneumann opened this issue 5 years ago • 6 comments

https://www.bioconductor.org/sitemap.xml gives

XML Parsing Error: not well-formed
Location: https://www.bioconductor.org/sitemap.xml
Line Number 1, Column 2:
<%= xml_sitemap %>
-^

from https://github.com/Bioconductor/bioconductor.org/blob/master/content/sitemap.xml There was a suggenstion in a discussion with @egonw about Add a sitemap.xml summarising site content to crawlers including google et al and TeSS Yours, Steffen

sneumann avatar May 31 '20 16:05 sneumann

since this has been there, unchanged, since March 15 2010 without comment maybe the most expeditious solution is to simple remove it?

mtmorgan avatar May 31 '20 16:05 mtmorgan

I suppose something is supposed to replace the placeholder with content. Yes, would be awesome if it contained a list of all vignettes (HTML) webpages and/or all packages. Indeed, that sitemap.xml can then be used by ELIXIR services to pick up content, e.g. ELIXIR TeSS but also BioSchemas (cc @AlasdairGray).

egonw avatar May 31 '20 16:05 egonw

The site is more than the repository of packages, so sitemap.xml doesn't sound appropriate for this purpose.

For what it's worth package metadata is already available in machine-readable format as https://bioconductor.org/packages/3.12/bioc/VIEWS and presumably also on individual pages if this https://github.com/Bioconductor/bioconductor.org/pull/25 were completed. I can't see the need for a third source of this information.

mtmorgan avatar May 31 '20 17:05 mtmorgan

The sitemap.xml is not critical, I agree. (Any sitemap.xml has redundant information.)

egonw avatar May 31 '20 17:05 egonw

It is a way of search engine optimisation. OTOH all content on BioC can be considered well-linked, we don't have dynamically generated content, and no dark corners of non-linked stuff we'd want to be found. In that case, removal of a broken sitemap.* is not a loss.

https://support.google.com/webmasters/answer/156184?hl=en&topic=8476&ctx=topic has more information when a sitemap is needed or not.

Yours, Steffen

sneumann avatar May 31 '20 18:05 sneumann

While a sitemap is not necessarily essential for the likes of Google who have "unlimited" resources to follow links and hopefully traverse a whole site, it is more difficult for others to do the same. For example, we have started scraping Bioschemas content but do not have the resource to do a full web crawl for it so are reliant on sitemaps.

AlasdairGray avatar Jun 01 '20 08:06 AlasdairGray