bioconductor.org
bioconductor.org copied to clipboard
broken sitemap.xml
https://www.bioconductor.org/sitemap.xml gives
XML Parsing Error: not well-formed
Location: https://www.bioconductor.org/sitemap.xml
Line Number 1, Column 2:
<%= xml_sitemap %>
-^
from https://github.com/Bioconductor/bioconductor.org/blob/master/content/sitemap.xml
There was a suggenstion in a discussion with @egonw about Add a sitemap.xml summarising site content to crawlers including google et al and TeSS
Yours,
Steffen
since this has been there, unchanged, since March 15 2010 without comment maybe the most expeditious solution is to simple remove it?
I suppose something is supposed to replace the placeholder with content. Yes, would be awesome if it contained a list of all vignettes (HTML) webpages and/or all packages. Indeed, that sitemap.xml can then be used by ELIXIR services to pick up content, e.g. ELIXIR TeSS but also BioSchemas (cc @AlasdairGray).
The site is more than the repository of packages, so sitemap.xml doesn't sound appropriate for this purpose.
For what it's worth package metadata is already available in machine-readable format as https://bioconductor.org/packages/3.12/bioc/VIEWS and presumably also on individual pages if this https://github.com/Bioconductor/bioconductor.org/pull/25 were completed. I can't see the need for a third source of this information.
The sitemap.xml is not critical, I agree. (Any sitemap.xml has redundant information.)
It is a way of search engine optimisation. OTOH all content on BioC can be considered well-linked,
we don't have dynamically generated content, and no dark corners of non-linked stuff we'd want to be found. In that case, removal of a broken sitemap.* is not a loss.
https://support.google.com/webmasters/answer/156184?hl=en&topic=8476&ctx=topic has more information when a sitemap is needed or not.
Yours, Steffen
While a sitemap is not necessarily essential for the likes of Google who have "unlimited" resources to follow links and hopefully traverse a whole site, it is more difficult for others to do the same. For example, we have started scraping Bioschemas content but do not have the resource to do a full web crawl for it so are reliant on sitemaps.