biopython.github.io icon indicating copy to clipboard operation
biopython.github.io copied to clipboard

Fix dead links on migrated website (Error 404 etc)

Open peterjc opened this issue 8 years ago • 19 comments

We know about a few problem links e.g. #8 #13 #17, but should check this systematically. There are also some external links which no longer work.

Ideally we'd have a TravisCI configuration to run Jekyll and then spider the _site/ folder to spot any link breakage.

peterjc avatar Apr 13 '16 16:04 peterjc

@peterjc We can try this approach. Seems to be exactly what we are looking for.

JoaoRodrigues avatar Apr 13 '16 17:04 JoaoRodrigues

Yes exactly :)

peterjc avatar Apr 13 '16 21:04 peterjc

I can give it a try, but I guess I'd need access to TravisCI?

JoaoRodrigues avatar Apr 13 '16 21:04 JoaoRodrigues

You should be able to try this by forking this to JoaoRodrigues.github.io and using your personal TravisCI account, e.g. https://github.com/peterjc/peterjc.github.io/blob/pre_auto_import/.travis.yml (currently out of sync) would just install and run Jekyll?

peterjc avatar Apr 13 '16 21:04 peterjc

I've just turned on https://travis-ci.org/biopython/biopython.github.io/ which should read https://github.com/biopython/biopython.github.io/blob/master/.travis.yml next time we update the website.

Currently the Travis configuration just runs Jekyll (a useful test in itself).

peterjc avatar Apr 16 '16 10:04 peterjc

Attached are two link reports, Internal only and all dead links. Where are the /DIST files hosted?

I will happily work on links.

All bad links v2 April 19 2016.csv.zip

Internal Bad links April 19 2016.csv.zip

vincentdavis avatar Apr 19 '16 17:04 vincentdavis

They're on a GitHub Pages "project" repository, see https://github.com/biopython/biopython.github.io/issues/7 and https://github.com/biopython/DIST (note gh-pages branch)

GitHub isn't ideal for what are mostly a collection of static files - and this is largely separate from the website so this seemed like a good idea. Long term I'd like us to host all our releases on PyPI (which currently doesn't cover Windows EXE/MSI files, but can do wheels).

peterjc avatar Apr 19 '16 20:04 peterjc

Many of the "internal" links Vincent flagged were inter-wiki links, mostly to BioPerl's wiki - see #17

peterjc avatar Apr 20 '16 10:04 peterjc

If I don't find a folder/file on https://github.com/biopython/DIST, e.g. http://biopython.org/DIST/docs/cluster/cluster.pdf, wasn't it transferred, did it go elsewhere or was is already missing before?

MarkusPiotrowski avatar Apr 21 '16 15:04 MarkusPiotrowski

@MarkusPiotrowski the DIST folder is being done as a GitHub Project Page, see https://github.com/biopython/DIST and earlier comments on this issue.

I had missed the cluster.pdf file, thanks for reporting that. I've committed that under @mdehoon's name under the publication date (July 2008): https://github.com/biopython/DIST/commit/282fcb07cad7b064d373739423a5c545db5cc144

peterjc avatar Apr 21 '16 17:04 peterjc

Other files on DIST that I'm missing are ACMbiopy.pdf and ACMbiopy.html (Chapman & Chang 2000 Biopython paper), formerly on [http://biopython.org/DIST/docs/acm/]

MarkusPiotrowski avatar Apr 23 '16 08:04 MarkusPiotrowski

Thanks @MarkusPiotrowski - I've added those too, under @chapmanb using the end of August 2000 as the date: https://github.com/biopython/DIST/commit/dc1fb6bd664244d745775ff115863b1222370736 - This would sit better under the presentations folder, but I don't want to needlessly break old URLs.

See also http://lists.open-bio.org/pipermail/biopython/2000-July/000305.html where Brad posted a draft of this to the mailing list.

peterjc avatar Apr 23 '16 10:04 peterjc

Commit https://github.com/biopython/biopython.github.io/commit/f173c02dd803bda0f45776f91fd535ff70bd85c1 was to enable me to use the Google website tools on biopython.org again (I'd set this up before on the old MediaWiki site), which includes broken link reports etc (see also #49 for the mailing list archive links).

As a bonus it reminded me to fix the robots.txt file https://github.com/biopython/biopython.github.io/commit/6bc179a26056bf1cb5714027bc4954d50086418c

peterjc avatar Apr 23 '16 22:04 peterjc

@vincentdavis we've fixed a lot of broken URLs in the last week or so - could you re-run that link checker? If you can post the new results as a gist rather than a zip file that might be slightly easier to view. Thanks!

peterjc avatar Apr 29 '16 16:04 peterjc

Recent status of broken links: https://gist.github.com/MarkusPiotrowski/37fdb4b1a27ec6e61a6b667a8fd4686a

About 175 broken links left (one month before we had ~450!):

  • most of these (~150) are missing Biopython versions (mostly .zip files and Windows installers, see also #7)
  • another ~20 files are related to the Tutorial (example files etc in SRC and DIST\docs).

MarkusPiotrowski avatar May 17 '16 21:05 MarkusPiotrowski

Thanks Markus - we're getting there!

We're not maintaining the biopython.org/SRC/ files anymore, instead those ought to point at the GitHub repository raw files, e.g. https://github.com/biopython/biopython.github.io/commit/24f297af230241d1a1c2a9686293e935793f0332

I don't think we ever wrote http://biopython.org/wiki/SeqFeature but it would be a logical addition - although as usual we have the tension with duplicating documentation in the tutorial and docstrings.

The news feed link was an easy fix: https://github.com/biopython/biopython.github.io/commit/7068156f6542bba989f207186508575bf89d28e4

Edit: I dealt with the missing user pages with https://github.com/biopython/biopython.github.io/commit/647e0f2be7318d789e1238a0346834f6debdbfbd and https://github.com/biopython/biopython.github.io/commit/d65f7b31350cbe5c80c002e718ae8d3e740f470b

peterjc avatar May 18 '16 09:05 peterjc

I think https://github.com/biopython/biopython/commit/e5072b92e8b5c66c5f76141753cf5adea5e527ca fixed most of the URLs in the Tutorial, perhaps I should put this online now rather than waiting for the next Biopython release?

peterjc avatar May 18 '16 09:05 peterjc

https://github.com/biopython/DIST/commit/1a6013af7d29d2fcc32f7db0113b10167ebf88ae should fix all the missing *.zip releases as part of #7.

peterjc avatar May 18 '16 10:05 peterjc

Removed links to Tutorial-dev.html and Tutorial-dev.pdf from the Tutorial in https://github.com/biopython/biopython/commit/baed26ea78b531e6e8a43697387e1edff0752160

peterjc avatar Aug 22 '16 11:08 peterjc