Check links in documentation
- [x] Vaguely addresses some conversation at #1789
- [x] I am familiar with the contributing guidelines
- ~[ ] Tests added~
- [ ] Updates entries in
docs/sphinx/source/referencefor API changes. - ~[ ] Adds description and name entries in the appropriate "what's new" file in
docs/sphinx/source/whatsnewfor all changes. Includes link to the GitHub Issue with:issue:`num`or this Pull Request with:pull:`num`. Includes contributor name and/or GitHub username (link with:ghuser:`user`).~ - ~[ ] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.~
- [x] Pull request is nearly complete and ready for detailed review.
- [x] Maintainer: Appropriate GitHub Labels (including
remote-data) and Milestone are assigned to the Pull Request and linked Issue.
Additions
GitHub action to run the sphinx linkchecker builder periodically
That. Right now configured to run every two weeks. Still under test and development.
(Context) Copypasting my comment from #1789:
On the other hand, for external links
linkcheckerbuilder has to be run.
I'm looking for a way to integrate it. From this comment it seems a bad idea to trigger that on every PR. In fact, all that conversation gives a good insight on how many things can go wrong with it.
I don't know how could I even take all the
linkcheckeroutput and create a test triggering that and determine if it passes or fails. There are many other occurrences out there to do link-checking, but I'm unable to find something that may work for PVLIB without problems.
Anyway, IMO the most promising approach is the one that is currently done at Matplotlib. See this instructions on the release procedure. And sorry for not bringing a better solution.
If an action could be made out of that, running it weekly or so would be nice, like this one.
Let me know what you think about this. I'm open to try anything, but I feel the easiest approach is to run the linkchecker on releases. Maybe some kind of grep over the output could leverage the work put into reading it.
Regarding enabling nitpicky, the signal to noise ratio of its output is not very good at the moment. Here is the log I'm looking at: https://readthedocs.org/api/v2/build/21309664.txt
About 1500 warnings, most of which aren't really "broken links" in the sense that we meant in #1789. ~500 of them (WARNING: py:class reference target not found: numeric) would get fixed by #1693. Some others we could fix too (change string to str, etc). But some of them don't seem like actual issues to me. For example py:class reference target not found: default 1 seems like something that should not be warned about.
I think nitpicky might be more hassle than it is worth.
For linkchecker, have you timed to see how long it takes to run? Maybe including it as a step in the release procedure could be a good option.
Thanks for reviewing @kandersolar , I didn't expect this draft PR to be reviewed.
Ofc nitpicky is inappropiate.
have you timed to see how long it takes to run?
It is about 4 mins on a Ubuntu virtual machine provided by my university. A bottle neck are GitHub links, there are a lot of them and most are ok (except for usernames renamed like yours, and some redirects because PRs, issues or discussions were linked as something else). I can exclude all of them or just the whatsnew files.
Maybe including it as a step in the release procedure could be a good option.
I had proposed so, but I feel like it would be better to add an action so any contributor can help with that periodically, not only you on each release without executing locally and waiting. (maybe should I add running it on version tags?)
- With the log already keeping what needs supervision is easier for anyone, no need to find the right commands to run the linkchecker
- On a release, I find many things need more attention than links, so I advocate to leverage that from the person releasing the package.
This action does not exclude doing so on the release procedure, but I would not limit who can help with that.
Run linkcheck on remote Ubuntu
git clone https://github.com/pvlib/pvlib-python cd pvlib-python
sudo apt-get install python3.8 python3.8-venv python3.8 -m venv venv source venv/bin/activate python3.8 -m pip install -e .[doc]
cd docs/sphinx python3.8 -m sphinx -b linkcheck -T -W --keep-going source build/linkcheck
I don't want to leave this PR open. If you agree, I'm either going to add the instructions in the release procedure wiki page or just closing this (I still believe checking & fixing links is out of scope for the person tagging the version).
The addition would be in this section, another step:
-
Run the
linkcheckersphinx builder-
On a linux OS, install the Python version for building the docs (currently it is 3.7) This link may be of help on Ubuntu.
sudo add-apt-repository ppa:deadsnakes/ppa -y && sudo apt update && sudo apt install python3.7 python3.7-venv -y -
Clone the repo and change working folder
git clone https://github.com/pvlib/pvlib-python && cd pvlib-pythonNote for future self: change language settings
sudo update-locale LANG=en_US.UTF-8 -
Run the following commands from the repo root folder:
python3.7 -m venv venv && source venv/bin/activate && python -m pip install .[doc] && cd docs/sphinxpython -m sphinx -b linkcheck -T -W --keep-going source build/linkcheck | awk '/broken/ && !(/github/ || /dx.doi.org/)' -
Fix broken links if needed
-
This is the output of the last command
Some text is in Spanish, but the problem is still legible.
source/whatsnew/v0.9.1.rst:16: WARNING: undefined label: forecasts
(reference/generated/pvlib.irradiance.get_ground_diffuse: line 35) broken http://files.pvsyst.com/help/albedo.htm - HTTPConnectionPool(host='files.pvsyst.com', port=80): Max retries exceeded with url: /help/albedo.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f74126ba390>: Failed to resolve 'files.pvsyst.com' ([Errno -2] Nombre o servicio desconocido)"))
(reference/generated/pvlib.temperature.pvsyst_cell: line 60) broken http://files.pvsyst.com/help/index.html - HTTPConnectionPool(host='files.pvsyst.com', port=80): Max retries exceeded with url: /help/index.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f7428bfced0>: Failed to resolve 'files.pvsyst.com' ([Errno -2] Nombre o servicio desconocido)"))
(user_guide/clearsky: line 370) broken http://gpsmet.noaa.gov/cgi-bin/gnuplots/rti.cgi - HTTPConnectionPool(host='gpsmet.noaa.gov', port=80): Max retries exceeded with url: /cgi-bin/gnuplots/rti.cgi (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f7428b9c550>: Failed to resolve 'gpsmet.noaa.gov' ([Errno -2] Nombre o servicio desconocido)"))
(user_guide/timetimezones: line 317) broken http://aa.usno.navy.mil/rstt/onedaytable?ID=AA&year=1997&month=1&day=1&state=AK&place=sand+point - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
( index: line 87) broken http://energy.sandia.gov/wp/wp-content/gallery/uploads/PV_LIB_Python_final_SAND2014-18444C.pdf - 404 Client Error: Not Found for url: https://energy.sandia.gov/wp/wp-content/gallery/uploads/PV_LIB_Python_final_SAND2014-18444C.pdf
(reference/generated/pvlib.solarposition.get_solarposition: line 44) broken http://rredc.nrel.gov/solar/codesandalgorithms/spa/ - 404 Client Error: Not Found for url: https://www.nrel.gov/wind/solar/codesandalgorithms/spa/
(reference/generated/pvlib.clearsky.bird: line 57) broken http://rredc.nrel.gov/solar/pubs/pdfs/tr-642-761.pdf - 404 Client Error: Not Found for url: https://www.nrel.gov/wind/solar/pubs/pdfs/tr-642-761.pdf
(reference/generated/pvlib.iotools.get_srml: line 2) broken http://solardat.uoregon.edu/download/Archive/ - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.irradiance.get_extra_radiation: line 27) broken http://solardat.uoregon.edu/SolarRadiationBasics.html - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.iotools.get_srml: line 49) broken http://solardat.uoregon.edu/ - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.iotools.get_srml: line 51) broken http://solardat.uoregon.edu/StationIDCodes.html - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.iotools.read_srml: line 28) broken http://solardat.uoregon.edu/ArchivalFiles.html - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(user_guide/variables_style_rules: line 27) broken http://www.soda-is.com/eng/education/acronymes.html - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/education/acronymes.html
(user_guide/variables_style_rules: line 25) broken http://www.soda-is.com/eng/education/units.html - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/education/units.html
(user_guide/variables_style_rules: line 26) broken http://www.soda-is.com/eng/education/terminology.html - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/education/terminology.html
(reference/generated/pvlib.clearsky.ineichen: line 48) broken http://www.soda-is.com/eng/services/climat_free_eng.php#c5 - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/services/climat_free_eng.php
(user_guide/clearsky: line 580) broken http://www.unige.ch/energie/fr/equipe/ineichen/solis-tool/ - 404 Client Error: Not Found for url: https://www.unige.ch/energy/fr/equipe/ineichen/solis-tool/
(reference/generated/pvlib.solarposition.solar_zenith_analytical: line 30) broken http://www.pveducation.org/pvcdrom/2-properties-sunlight/suns-position - 404 Client Error: Not Found for url: https://www.pveducation.org/pvcdrom/properties-of-sunlight/suns-position
(reference/generated/pvlib.iotools.get_acis_mpe: line 2) broken https://data.rcc-acis.org/GridData - 400 Client Error: Bad Request for url: https://data.rcc-acis.org/GridData
(reference/generated/pvlib.irradiance.get_ground_diffuse: line 35) broken https://doi.org/10.1175/1520-0469(1972 - 404 Client Error: Not Found for url: https://doi.org/10.1175/1520-0469(1972
(reference/generated/pvlib.iotools.get_pvgis_hourly: line 131) broken https://ec.europa.eu/jrc/en/PVGIS/docs/noninteractive - 404 Client Error: Not Found for url: https://joint-research-centre.ec.europa.eu/PVGIS/docs/noninteractive_en
(reference/generated/pvlib.iam.ashrae: line 37) broken https://files.pvsyst.com/help/index.html?iam_loss.htm - HTTPSConnectionPool(host='files.pvsyst.com', port=443): Max retries exceeded with url: /help/index.html?iam_loss.htm (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f742871df50>: Failed to resolve 'files.pvsyst.com' ([Errno -2] Nombre o servicio desconocido)"))
(reference/generated/pvlib.iotools.parse_epw: line 13) broken https://energyplus.net/documentation - 404 Client Error: Not Found for url: https://energyplus.net/documentation
(reference/generated/pvlib.iotools.read_epw: line 6) broken https://energyplus.net/weather - 404 Client Error: Not Found for url: https://energyplus.net/weather
(user_guide/clearsky: line 347) broken https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.ipynb - 404 Client Error: Not Found for url: https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.ipynb
(user_guide/clearsky: line 347) broken https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.html - 404 Client Error: Not Found for url: https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.html
(reference/generated/pvlib.solarposition.spa_c: line 49) broken http://www.usno.navy.mil/USNO/earth-orientation/eo-products/long-term - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(user_guide/variables_style_rules: line 18) broken https://pvpmc.sandia.gov/resources-and-events/variable-list/ - 404 Client Error: Not Found for url: https://pvpmc.sandia.gov/resources-and-events/variable-list/
(reference/generated/pvlib.iotools.get_pvgis_hourly: line 2) broken https://re.jrc.ec.europa.eu/api/v5_2/ - 404 Client Error: NOT FOUND for url: https://re.jrc.ec.europa.eu/api/v5_2/
(user_guide/installation: line 43) broken https://www.anaconda.com/what-is-anaconda/ - 404 Client Error: Not Found for url: https://www.anaconda.com/what-is-anaconda/
(reference/generated/pvlib.pvsystem.dc_ohmic_losses: line 15) broken https://www.pvsyst.com/help/ohmic_loss.htm - HTTPSConnectionPool(host='www.pvsyst.com', port=443): Max retries exceeded with url: /help/ohmic_loss.htm (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)')))
The process can be sped up with this statement, although it requires changes in the repo.
I can exclude all of them or just the whatsnew files.
Are lines in the report truncated? For example, the report says
(reference/generated/pvlib.irradiance.get_ground_diffuse: line 35) broken https://doi.org/10.1175/1520-0469(1972 - 404 Client Error: Not Found for url: https://doi.org/10.1175/1520-0469(1972
The link in the code file is https://doi.org/10.1175/1520-0469(1972)029<0959:AOTSS>2.0.CO;2 and that link works.
The link in the code file is https://doi.org/10.1175/1520-0469(1972)029<0959:AOTSS>2.0.CO;2 and that link works.
Aha, it appears that the '>' character in the URL is a problem at least for github, probably similar for the linkcheck.
The link in the code file is https://doi.org/10.1175/1520-0469(1972)029<0959:AOTSS>2.0.CO;2 and that link works.
Aha, it appears that the '>' character in the URL is a problem at least for github, probably similar for the linkcheck.
sphinx breaks that link at the close parentheses ')' #1953
@echedey-ls can you re-run the link check after #1960 is merged? I am not able to reproduce the report you posted as I'm not capable enough to do it.
@cwhanse , I can checkout that branch and move the report to that PR, if its scope is broad enough.
If I can help you running the linkchecker, let me know.
EDIT: nevermind, found the issue: && is what I intended to type instead of &. Should be able to run on any linux-based OS.
I've added one-liners up in the comment.
EDIT2: added easy-peasy copy&paste commands to run all needed on that comment.
Is there still interest in this feature? Understand it by documenting more steps in the release procedure or including some CI, periodically or on main commits/PRs. I'd like to take some action regarding it.