biopython.github.io
biopython.github.io copied to clipboard
Fix dead links on migrated website (Error 404 etc)
We know about a few problem links e.g. #8 #13 #17, but should check this systematically. There are also some external links which no longer work.
Ideally we'd have a TravisCI configuration to run Jekyll and then spider the _site/
folder to spot any link breakage.
@peterjc We can try this approach. Seems to be exactly what we are looking for.
Yes exactly :)
I can give it a try, but I guess I'd need access to TravisCI?
You should be able to try this by forking this to JoaoRodrigues.github.io
and using your personal TravisCI account, e.g. https://github.com/peterjc/peterjc.github.io/blob/pre_auto_import/.travis.yml (currently out of sync) would just install and run Jekyll?
I've just turned on https://travis-ci.org/biopython/biopython.github.io/ which should read https://github.com/biopython/biopython.github.io/blob/master/.travis.yml next time we update the website.
Currently the Travis configuration just runs Jekyll (a useful test in itself).
Attached are two link reports, Internal only and all dead links. Where are the /DIST files hosted?
I will happily work on links.
They're on a GitHub Pages "project" repository, see https://github.com/biopython/biopython.github.io/issues/7 and https://github.com/biopython/DIST (note gh-pages branch)
GitHub isn't ideal for what are mostly a collection of static files - and this is largely separate from the website so this seemed like a good idea. Long term I'd like us to host all our releases on PyPI (which currently doesn't cover Windows EXE/MSI files, but can do wheels).
Many of the "internal" links Vincent flagged were inter-wiki links, mostly to BioPerl's wiki - see #17
If I don't find a folder/file on https://github.com/biopython/DIST, e.g. http://biopython.org/DIST/docs/cluster/cluster.pdf, wasn't it transferred, did it go elsewhere or was is already missing before?
@MarkusPiotrowski the DIST
folder is being done as a GitHub Project Page, see https://github.com/biopython/DIST and earlier comments on this issue.
I had missed the cluster.pdf
file, thanks for reporting that. I've committed that under @mdehoon's name under the publication date (July 2008): https://github.com/biopython/DIST/commit/282fcb07cad7b064d373739423a5c545db5cc144
Other files on DIST
that I'm missing are ACMbiopy.pdf
and ACMbiopy.html
(Chapman & Chang 2000 Biopython paper), formerly on [http://biopython.org/DIST/docs/acm/]
Thanks @MarkusPiotrowski - I've added those too, under @chapmanb using the end of August 2000 as the date: https://github.com/biopython/DIST/commit/dc1fb6bd664244d745775ff115863b1222370736 - This would sit better under the presentations folder, but I don't want to needlessly break old URLs.
See also http://lists.open-bio.org/pipermail/biopython/2000-July/000305.html where Brad posted a draft of this to the mailing list.
Commit https://github.com/biopython/biopython.github.io/commit/f173c02dd803bda0f45776f91fd535ff70bd85c1 was to enable me to use the Google website tools on biopython.org again (I'd set this up before on the old MediaWiki site), which includes broken link reports etc (see also #49 for the mailing list archive links).
As a bonus it reminded me to fix the robots.txt
file https://github.com/biopython/biopython.github.io/commit/6bc179a26056bf1cb5714027bc4954d50086418c
@vincentdavis we've fixed a lot of broken URLs in the last week or so - could you re-run that link checker? If you can post the new results as a gist rather than a zip file that might be slightly easier to view. Thanks!
Recent status of broken links: https://gist.github.com/MarkusPiotrowski/37fdb4b1a27ec6e61a6b667a8fd4686a
About 175 broken links left (one month before we had ~450!):
- most of these (~150) are missing Biopython versions (mostly
.zip
files and Windows installers, see also #7) - another ~20 files are related to the Tutorial (example files etc in
SRC
andDIST\docs
).
Thanks Markus - we're getting there!
We're not maintaining the biopython.org/SRC/
files anymore, instead those ought to point at the GitHub repository raw files, e.g. https://github.com/biopython/biopython.github.io/commit/24f297af230241d1a1c2a9686293e935793f0332
I don't think we ever wrote http://biopython.org/wiki/SeqFeature but it would be a logical addition - although as usual we have the tension with duplicating documentation in the tutorial and docstrings.
The news feed link was an easy fix: https://github.com/biopython/biopython.github.io/commit/7068156f6542bba989f207186508575bf89d28e4
Edit: I dealt with the missing user pages with https://github.com/biopython/biopython.github.io/commit/647e0f2be7318d789e1238a0346834f6debdbfbd and https://github.com/biopython/biopython.github.io/commit/d65f7b31350cbe5c80c002e718ae8d3e740f470b
I think https://github.com/biopython/biopython/commit/e5072b92e8b5c66c5f76141753cf5adea5e527ca fixed most of the URLs in the Tutorial, perhaps I should put this online now rather than waiting for the next Biopython release?
https://github.com/biopython/DIST/commit/1a6013af7d29d2fcc32f7db0113b10167ebf88ae should fix all the missing *.zip
releases as part of #7.
Removed links to Tutorial-dev.html
and Tutorial-dev.pdf
from the Tutorial in https://github.com/biopython/biopython/commit/baed26ea78b531e6e8a43697387e1edff0752160