pvlib-python icon indicating copy to clipboard operation
pvlib-python copied to clipboard

Check links in documentation

Open echedey-ls opened this issue 2 years ago • 8 comments

  • [x] Vaguely addresses some conversation at #1789
  • [x] I am familiar with the contributing guidelines
  • ~[ ] Tests added~
  • [ ] Updates entries in docs/sphinx/source/reference for API changes.
  • ~[ ] Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).~
  • ~[ ] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.~
  • [x] Pull request is nearly complete and ready for detailed review.
  • [x] Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

Additions

GitHub action to run the sphinx linkchecker builder periodically

That. Right now configured to run every two weeks. Still under test and development.

(Context) Copypasting my comment from #1789:

On the other hand, for external links linkchecker builder has to be run.

I'm looking for a way to integrate it. From this comment it seems a bad idea to trigger that on every PR. In fact, all that conversation gives a good insight on how many things can go wrong with it.

I don't know how could I even take all the linkchecker output and create a test triggering that and determine if it passes or fails. There are many other occurrences out there to do link-checking, but I'm unable to find something that may work for PVLIB without problems.

Anyway, IMO the most promising approach is the one that is currently done at Matplotlib. See this instructions on the release procedure. And sorry for not bringing a better solution.

If an action could be made out of that, running it weekly or so would be nice, like this one.

Let me know what you think about this. I'm open to try anything, but I feel the easiest approach is to run the linkchecker on releases. Maybe some kind of grep over the output could leverage the work put into reading it.

echedey-ls avatar Jun 29 '23 22:06 echedey-ls

Regarding enabling nitpicky, the signal to noise ratio of its output is not very good at the moment. Here is the log I'm looking at: https://readthedocs.org/api/v2/build/21309664.txt

About 1500 warnings, most of which aren't really "broken links" in the sense that we meant in #1789. ~500 of them (WARNING: py:class reference target not found: numeric) would get fixed by #1693. Some others we could fix too (change string to str, etc). But some of them don't seem like actual issues to me. For example py:class reference target not found: default 1 seems like something that should not be warned about.

I think nitpicky might be more hassle than it is worth.

For linkchecker, have you timed to see how long it takes to run? Maybe including it as a step in the release procedure could be a good option.

kandersolar avatar Jul 19 '23 19:07 kandersolar

Thanks for reviewing @kandersolar , I didn't expect this draft PR to be reviewed.

Ofc nitpicky is inappropiate.

have you timed to see how long it takes to run?

It is about 4 mins on a Ubuntu virtual machine provided by my university. A bottle neck are GitHub links, there are a lot of them and most are ok (except for usernames renamed like yours, and some redirects because PRs, issues or discussions were linked as something else). I can exclude all of them or just the whatsnew files.

Maybe including it as a step in the release procedure could be a good option.

I had proposed so, but I feel like it would be better to add an action so any contributor can help with that periodically, not only you on each release without executing locally and waiting. (maybe should I add running it on version tags?)

  1. With the log already keeping what needs supervision is easier for anyone, no need to find the right commands to run the linkchecker
  2. On a release, I find many things need more attention than links, so I advocate to leverage that from the person releasing the package.

This action does not exclude doing so on the release procedure, but I would not limit who can help with that.

Run linkcheck on remote Ubuntu

git clone https://github.com/pvlib/pvlib-python cd pvlib-python

sudo apt-get install python3.8 python3.8-venv python3.8 -m venv venv source venv/bin/activate python3.8 -m pip install -e .[doc]

cd docs/sphinx python3.8 -m sphinx -b linkcheck -T -W --keep-going source build/linkcheck

echedey-ls avatar Jul 27 '23 15:07 echedey-ls

I don't want to leave this PR open. If you agree, I'm either going to add the instructions in the release procedure wiki page or just closing this (I still believe checking & fixing links is out of scope for the person tagging the version).

The addition would be in this section, another step:

  1. Run the linkchecker sphinx builder

    1. On a linux OS, install the Python version for building the docs (currently it is 3.7) This link may be of help on Ubuntu. sudo add-apt-repository ppa:deadsnakes/ppa -y && sudo apt update && sudo apt install python3.7 python3.7-venv -y

    2. Clone the repo and change working folder git clone https://github.com/pvlib/pvlib-python && cd pvlib-python

      Note for future self: change language settings sudo update-locale LANG=en_US.UTF-8

    3. Run the following commands from the repo root folder: python3.7 -m venv venv && source venv/bin/activate && python -m pip install .[doc] && cd docs/sphinx python -m sphinx -b linkcheck -T -W --keep-going source build/linkcheck | awk '/broken/ && !(/github/ || /dx.doi.org/)'

    4. Fix broken links if needed

This is the output of the last command

Some text is in Spanish, but the problem is still legible.

source/whatsnew/v0.9.1.rst:16: WARNING: undefined label: forecasts
(reference/generated/pvlib.irradiance.get_ground_diffuse: line   35) broken    http://files.pvsyst.com/help/albedo.htm - HTTPConnectionPool(host='files.pvsyst.com', port=80): Max retries exceeded with url: /help/albedo.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f74126ba390>: Failed to resolve 'files.pvsyst.com' ([Errno -2] Nombre o servicio desconocido)"))
(reference/generated/pvlib.temperature.pvsyst_cell: line   60) broken    http://files.pvsyst.com/help/index.html - HTTPConnectionPool(host='files.pvsyst.com', port=80): Max retries exceeded with url: /help/index.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f7428bfced0>: Failed to resolve 'files.pvsyst.com' ([Errno -2] Nombre o servicio desconocido)"))
(user_guide/clearsky: line  370) broken    http://gpsmet.noaa.gov/cgi-bin/gnuplots/rti.cgi - HTTPConnectionPool(host='gpsmet.noaa.gov', port=80): Max retries exceeded with url: /cgi-bin/gnuplots/rti.cgi (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f7428b9c550>: Failed to resolve 'gpsmet.noaa.gov' ([Errno -2] Nombre o servicio desconocido)"))
(user_guide/timetimezones: line  317) broken    http://aa.usno.navy.mil/rstt/onedaytable?ID=AA&year=1997&month=1&day=1&state=AK&place=sand+point - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(           index: line   87) broken    http://energy.sandia.gov/wp/wp-content/gallery/uploads/PV_LIB_Python_final_SAND2014-18444C.pdf - 404 Client Error: Not Found for url: https://energy.sandia.gov/wp/wp-content/gallery/uploads/PV_LIB_Python_final_SAND2014-18444C.pdf
(reference/generated/pvlib.solarposition.get_solarposition: line   44) broken    http://rredc.nrel.gov/solar/codesandalgorithms/spa/ - 404 Client Error: Not Found for url: https://www.nrel.gov/wind/solar/codesandalgorithms/spa/
(reference/generated/pvlib.clearsky.bird: line   57) broken    http://rredc.nrel.gov/solar/pubs/pdfs/tr-642-761.pdf - 404 Client Error: Not Found for url: https://www.nrel.gov/wind/solar/pubs/pdfs/tr-642-761.pdf
(reference/generated/pvlib.iotools.get_srml: line    2) broken    http://solardat.uoregon.edu/download/Archive/ - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.irradiance.get_extra_radiation: line   27) broken    http://solardat.uoregon.edu/SolarRadiationBasics.html - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.iotools.get_srml: line   49) broken    http://solardat.uoregon.edu/ - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.iotools.get_srml: line   51) broken    http://solardat.uoregon.edu/StationIDCodes.html - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(reference/generated/pvlib.iotools.read_srml: line   28) broken    http://solardat.uoregon.edu/ArchivalFiles.html - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(user_guide/variables_style_rules: line   27) broken    http://www.soda-is.com/eng/education/acronymes.html - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/education/acronymes.html
(user_guide/variables_style_rules: line   25) broken    http://www.soda-is.com/eng/education/units.html - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/education/units.html
(user_guide/variables_style_rules: line   26) broken    http://www.soda-is.com/eng/education/terminology.html - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/education/terminology.html
(reference/generated/pvlib.clearsky.ineichen: line   48) broken    http://www.soda-is.com/eng/services/climat_free_eng.php#c5 - 404 Client Error: Not Found for url: http://www.soda-is.com/eng/services/climat_free_eng.php
(user_guide/clearsky: line  580) broken    http://www.unige.ch/energie/fr/equipe/ineichen/solis-tool/ - 404 Client Error: Not Found for url: https://www.unige.ch/energy/fr/equipe/ineichen/solis-tool/
(reference/generated/pvlib.solarposition.solar_zenith_analytical: line   30) broken    http://www.pveducation.org/pvcdrom/2-properties-sunlight/suns-position - 404 Client Error: Not Found for url: https://www.pveducation.org/pvcdrom/properties-of-sunlight/suns-position
(reference/generated/pvlib.iotools.get_acis_mpe: line    2) broken    https://data.rcc-acis.org/GridData - 400 Client Error: Bad Request for url: https://data.rcc-acis.org/GridData
(reference/generated/pvlib.irradiance.get_ground_diffuse: line   35) broken    https://doi.org/10.1175/1520-0469(1972 - 404 Client Error: Not Found for url: https://doi.org/10.1175/1520-0469(1972
(reference/generated/pvlib.iotools.get_pvgis_hourly: line  131) broken    https://ec.europa.eu/jrc/en/PVGIS/docs/noninteractive - 404 Client Error: Not Found for url: https://joint-research-centre.ec.europa.eu/PVGIS/docs/noninteractive_en
(reference/generated/pvlib.iam.ashrae: line   37) broken    https://files.pvsyst.com/help/index.html?iam_loss.htm - HTTPSConnectionPool(host='files.pvsyst.com', port=443): Max retries exceeded with url: /help/index.html?iam_loss.htm (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f742871df50>: Failed to resolve 'files.pvsyst.com' ([Errno -2] Nombre o servicio desconocido)"))
(reference/generated/pvlib.iotools.parse_epw: line   13) broken    https://energyplus.net/documentation - 404 Client Error: Not Found for url: https://energyplus.net/documentation
(reference/generated/pvlib.iotools.read_epw: line    6) broken    https://energyplus.net/weather - 404 Client Error: Not Found for url: https://energyplus.net/weather
(user_guide/clearsky: line  347) broken    https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.ipynb - 404 Client Error: Not Found for url: https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.ipynb
(user_guide/clearsky: line  347) broken    https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.html - 404 Client Error: Not Found for url: https://forecasting.energy.arizona.edu/media/ineichen_vs_mcclear.html
(reference/generated/pvlib.solarposition.spa_c: line   49) broken    http://www.usno.navy.mil/USNO/earth-orientation/eo-products/long-term - ('Connection aborted.', ConnectionResetError(104, 'Conexión reinicializada por la máquina remota'))
(user_guide/variables_style_rules: line   18) broken    https://pvpmc.sandia.gov/resources-and-events/variable-list/ - 404 Client Error: Not Found for url: https://pvpmc.sandia.gov/resources-and-events/variable-list/
(reference/generated/pvlib.iotools.get_pvgis_hourly: line    2) broken    https://re.jrc.ec.europa.eu/api/v5_2/ - 404 Client Error: NOT FOUND for url: https://re.jrc.ec.europa.eu/api/v5_2/
(user_guide/installation: line   43) broken    https://www.anaconda.com/what-is-anaconda/ - 404 Client Error: Not Found for url: https://www.anaconda.com/what-is-anaconda/
(reference/generated/pvlib.pvsystem.dc_ohmic_losses: line   15) broken    https://www.pvsyst.com/help/ohmic_loss.htm - HTTPSConnectionPool(host='www.pvsyst.com', port=443): Max retries exceeded with url: /help/ohmic_loss.htm (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)')))

The process can be sped up with this statement, although it requires changes in the repo.

I can exclude all of them or just the whatsnew files.

echedey-ls avatar Jan 26 '24 17:01 echedey-ls

Are lines in the report truncated? For example, the report says

(reference/generated/pvlib.irradiance.get_ground_diffuse: line   35) broken    https://doi.org/10.1175/1520-0469(1972 - 404 Client Error: Not Found for url: https://doi.org/10.1175/1520-0469(1972

The link in the code file is https://doi.org/10.1175/1520-0469(1972)029<0959:AOTSS>2.0.CO;2 and that link works.

cwhanse avatar Jan 26 '24 19:01 cwhanse

The link in the code file is https://doi.org/10.1175/1520-0469(1972)029<0959:AOTSS>2.0.CO;2 and that link works.

Aha, it appears that the '>' character in the URL is a problem at least for github, probably similar for the linkcheck.

cwhanse avatar Jan 26 '24 19:01 cwhanse

The link in the code file is https://doi.org/10.1175/1520-0469(1972)029<0959:AOTSS>2.0.CO;2 and that link works.

Aha, it appears that the '>' character in the URL is a problem at least for github, probably similar for the linkcheck.

sphinx breaks that link at the close parentheses ')' #1953

cwhanse avatar Jan 26 '24 20:01 cwhanse

@echedey-ls can you re-run the link check after #1960 is merged? I am not able to reproduce the report you posted as I'm not capable enough to do it.

cwhanse avatar Feb 01 '24 23:02 cwhanse

@cwhanse , I can checkout that branch and move the report to that PR, if its scope is broad enough.

If I can help you running the linkchecker, let me know.

EDIT: nevermind, found the issue: && is what I intended to type instead of &. Should be able to run on any linux-based OS. I've added one-liners up in the comment. EDIT2: added easy-peasy copy&paste commands to run all needed on that comment.

echedey-ls avatar Feb 01 '24 23:02 echedey-ls

Is there still interest in this feature? Understand it by documenting more steps in the release procedure or including some CI, periodically or on main commits/PRs. I'd like to take some action regarding it.

echedey-ls avatar Jul 13 '24 11:07 echedey-ls