shogun icon indicating copy to clipboard operation
shogun copied to clipboard

Broken links in notebooks.

Open kmcnaught opened this issue 8 years ago • 16 comments

A few notebooks have web links that are not parsed correctly. Links missing http[s]://www. are opened as paths relative to the notebook server. Emails need 'mailto:' if they are intended to be a clickable link.

The following is a list of notebook files and 'bad' links.

doc/ipython-notebooks/clustering/GMM.ipynb github.com/karlnapf herrstrathmann.de

doc/ipython-notebooks/clustering/.ipynb_checkpoints/GMM-checkpoint.ipynb github.com/karlnapf herrstrathmann.de

ipython-notebooks/evaluation/xval_modelselection.ipynb github.com/karlnapf herrstrathmann.de

ipython-notebooks/statistical_testing/mmd_two_sample_testing.ipynb github.com/karlnapf herrstrathmann.de [email protected] github.com/lambday

kmcnaught avatar Mar 15 '17 09:03 kmcnaught

Thanks for reporting. Nice entrance task for GSoC students

karlnapf avatar Mar 15 '17 10:03 karlnapf

@karlnapf I would like to work on this. I'm planning to apply for real world applications project. This doesn't come under entrance for any other project right?

ghost avatar Mar 15 '17 15:03 ghost

The entrance tasks are not really project related, especially not those easy ones

karlnapf avatar Mar 15 '17 17:03 karlnapf

@karlnapf I think this can be closed now

ghost avatar Mar 25 '17 15:03 ghost

What about you run a linkchecker and then we close? :)

karlnapf avatar Mar 26 '17 01:03 karlnapf

I'm not sure I follow

ghost avatar Mar 26 '17 03:03 ghost

I meant: you could run an automated tool that verifies the links in all the notebooks in the repository. If broken ones are found, you send another patch, otherwise we close this issue.

karlnapf avatar Mar 26 '17 03:03 karlnapf

Sounds interesting. I'll get right to it.

ghost avatar Mar 26 '17 03:03 ghost

Here is a list of broken-links, that are not yet fixed

https://gist.github.com/Red-devilz/67dee8c8afc2502202b16466ff6da225

rahul13ramesh avatar Jan 19 '18 19:01 rahul13ramesh

Thanks for that! Really useful to have that list!

karlnapf avatar Jan 20 '18 00:01 karlnapf

Were all the links verified? Could you please provide the list of broken links above. It doesn't seems to be opening(might be a broken link) i.e https://gist.github.com/Red-devilz/67dee8c8afc2502202b16466ff6da225 (list of broken links)(not opening)

bhavukkalra avatar Mar 10 '20 15:03 bhavukkalra

@bhavukkalra the easiest is that you open the notebooks and try to open the links... of course there's a smarter way to do it. basically do a regex (for http://....) on the notebooks, get out the links and try to fetch them with curl or wget or any other command line tool, and if the status code is not 200 then it's a broken link

vigsterkr avatar Mar 10 '20 16:03 vigsterkr

and in fact if you write that shell script then could you please share it to this issue, coz then we can actually integrate that check into our CI ;)

vigsterkr avatar Mar 10 '20 16:03 vigsterkr

Sure. A script that extracts link from notebooks in a file. Run curl command on them and check if the links are broken(by printing links also conveying if it is broken or not). Could you please confirm.. also can i do this with a python script instead of a shell script? or is it a necessity?

bhavukkalra avatar Mar 10 '20 16:03 bhavukkalra

@bhavukkalra yes... but no need to generate a file. just parse the notebooks, get the links, test them and print the ones that are broken. and of course you can use python for this, whichever is the easiest for you

vigsterkr avatar Mar 10 '20 16:03 vigsterkr

I was successfully able to extract links from a ipython file. but the curl command seems to be working on made up(broken links) as well and not giving satisfactory results. for example for the link --- https://www.shoguntoolbox.org/api/latest/classshogun_1_1DenseFatures.htm should we use external libraries for this for example https://pypi.org/project/LinkChecker/ or do we want to restrict using external libraries and make this from scratch?

bhavukkalra avatar Mar 11 '20 18:03 bhavukkalra