ardupilot_wiki icon indicating copy to clipboard operation
ardupilot_wiki copied to clipboard

Add script to check links

Open stephendade opened this issue 4 years ago • 14 comments

I've added a small script to get Sphinx to check all internal and external links in the wiki for validity.

It's very slow to run, so I don't recommend adding it to the CI. Rather run it manually every now and again and fix up any issues.

stephendade avatar Jul 02 '20 11:07 stephendade

it would be nice if there was an option to only output the broken links with the document header and line number....now I dont think that it can be even piped as error output to a file and make sense as is....either that or send the file names and line info to error output also....but prefer the former...

Hwurzburg avatar Jul 02 '20 12:07 Hwurzburg

another issue is that we will always have broken links in the archived pages....would be nice to look at the common-archived-pages toc and exclude any linked doc from the search .... I once a quarter run a broken link checker and fix a few low hanging fruit....but 200+ broken links takes a bit of time...most important would be any internal links since they should be caught in the build process or are they being skipped with the grayed out notation?

Hwurzburg avatar Jul 02 '20 12:07 Hwurzburg

I tried running the plane one just for a test....and it locked up after about 15min...what should I try next?

Hwurzburg avatar Jul 02 '20 23:07 Hwurzburg

@Hwurzburg One thing you can try is another link checker, fix as many links as you can, and this may perhaps fix your problem link for sphinx. Yes I know this is a crap solution, but I've been reduced to this sort of thing previously.

I have tried both https://github.com/gjtorikian/html-proofer and https://github.com/wjdp/htmltest as post-process tools on the source. HTML-proofer is well known, but I prefer htmltest - it is faster, and doesn't choke

hamishwillee avatar Jul 03 '20 00:07 hamishwillee

@hamishwillee as I said above, I run a broken link checker occasionally....fixed a bunch...a lot will always be there due to archiving...I also try to check every page's historical links as I touch them just more important stuff to do at the moment...not sure of the utility of this really, since broken link checkers do the same, and intersite and intrasite links are caught by the wiki build process...

Hwurzburg avatar Jul 03 '20 00:07 Hwurzburg

I tried running the plane one just for a test....and it locked up after about 15min...what should I try next?

I found it did stall for a few min here and there.

@Hwurzburg: If you've already got a method for broken link checking, that's fine. I can close this.

stephendade avatar Jul 03 '20 00:07 stephendade

There was a previous PR #2226 regarding that.

I guess the Henry method of running an online tool occasionally is better than my approach.

brunoolivieri avatar Jul 07 '20 09:07 brunoolivieri

Stephen and I discussed, and with some changes it will be useful as a tool and I will use it instead

Hwurzburg avatar Jul 07 '20 12:07 Hwurzburg

cool!

brunoolivieri avatar Jul 07 '20 16:07 brunoolivieri

I've updated the script to output to a log file (per wiki) any broken links. Should make things easier.

stephendade avatar Jul 08 '20 03:07 stephendade

tried just the dev script since its a small site....again it infinitely hung after 35%....crtl-c would not kill it...had to x box close the terminal window...may be my wonky linux box (Tridge promised to ssh in soon and try to get my second faster wifi adapter working when he had a chance)...restarting the script

Hwurzburg avatar Jul 08 '20 23:07 Hwurzburg

More updates - I've tweaked the settings to allow a ctrl+c to exit cleanly, plus (hopefully) fix the slowness.

I have discovered that the slowness can be caused by the linkchecker downloading an entire link, which is an issue for direct links to large downloads. Running the linkchecker I downloaded multiple Gb.

The issue is a combination of the Python requests module and the remote server configuration, where Python needs to wait for the link download to finish before confirming that it's a valid link. Depending on remote server support, Python requests can use the requests.head() method to just get the server response rather than the content of the link. I suspect not all servers support this.

I would recommend ensuring you've got a decent Internet connection before running this!

stephendade avatar Jul 10 '20 01:07 stephendade

Think that having a link checking feature would be very helpful rather than ad hoc manual work. Is there anything else that the maintainers require for this pull request to be merged?

darigovresearch avatar May 30 '21 19:05 darigovresearch

there are hundreds of broken links...the normal web based link checkers can highlight them....I dont have an issue adding this as tool somewhere if someone wants to use it....but not in CI or normal builds....not until all broken links get fixed...this has been much lower priority than the current wiki work...

Hwurzburg avatar May 30 '21 19:05 Hwurzburg