shogun
shogun copied to clipboard
Broken links in notebooks.
A few notebooks have web links that are not parsed correctly. Links missing http[s]://www. are opened as paths relative to the notebook server. Emails need 'mailto:' if they are intended to be a clickable link.
The following is a list of notebook files and 'bad' links.
doc/ipython-notebooks/clustering/GMM.ipynb github.com/karlnapf herrstrathmann.de
doc/ipython-notebooks/clustering/.ipynb_checkpoints/GMM-checkpoint.ipynb github.com/karlnapf herrstrathmann.de
ipython-notebooks/evaluation/xval_modelselection.ipynb github.com/karlnapf herrstrathmann.de
ipython-notebooks/statistical_testing/mmd_two_sample_testing.ipynb github.com/karlnapf herrstrathmann.de [email protected] github.com/lambday
Thanks for reporting. Nice entrance task for GSoC students
@karlnapf I would like to work on this. I'm planning to apply for real world applications project. This doesn't come under entrance for any other project right?
The entrance tasks are not really project related, especially not those easy ones
@karlnapf I think this can be closed now
What about you run a linkchecker and then we close? :)
I'm not sure I follow
I meant: you could run an automated tool that verifies the links in all the notebooks in the repository. If broken ones are found, you send another patch, otherwise we close this issue.
Sounds interesting. I'll get right to it.
Here is a list of broken-links, that are not yet fixed
https://gist.github.com/Red-devilz/67dee8c8afc2502202b16466ff6da225
Thanks for that! Really useful to have that list!
Were all the links verified? Could you please provide the list of broken links above. It doesn't seems to be opening(might be a broken link) i.e https://gist.github.com/Red-devilz/67dee8c8afc2502202b16466ff6da225 (list of broken links)(not opening)
@bhavukkalra the easiest is that you open the notebooks and try to open the links... of course there's a smarter way to do it. basically do a regex (for http://....) on the notebooks, get out the links and try to fetch them with curl or wget or any other command line tool, and if the status code is not 200 then it's a broken link
and in fact if you write that shell script then could you please share it to this issue, coz then we can actually integrate that check into our CI ;)
Sure. A script that extracts link from notebooks in a file. Run curl command on them and check if the links are broken(by printing links also conveying if it is broken or not). Could you please confirm.. also can i do this with a python script instead of a shell script? or is it a necessity?
@bhavukkalra yes... but no need to generate a file. just parse the notebooks, get the links, test them and print the ones that are broken. and of course you can use python for this, whichever is the easiest for you
I was successfully able to extract links from a ipython file. but the curl command seems to be working on made up(broken links) as well and not giving satisfactory results. for example for the link --- https://www.shoguntoolbox.org/api/latest/classshogun_1_1DenseFatures.htm should we use external libraries for this for example https://pypi.org/project/LinkChecker/ or do we want to restrict using external libraries and make this from scratch?