pvlib-python variables_style_rules.csv could use an update

The table of parameter names in the docs is a bit dusty and doesn't perfectly reflect today's pvlib. To make it easier to figure out where the gaps are, here's a script that scans the package, finds the parameter names currently in use, and compares with that table:

import pvlib
import pandas as pd
from inspect import getmembers, isfunction, ismodule, signature
from collections import defaultdict

skip_modules = ['version', 'forecast', 'iotools', 'location', 'modelchain', 'tools']


def get_parameter_map():
    """
    Find all parameter names used in public pvlib functions
    """
    parameter_map = defaultdict(set)  # map parameter names to functions that use them

    for module_name, module in getmembers(pvlib, ismodule):
        if module_name in skip_modules or module_name.startswith("_"):
            continue
        for function_name, function in getmembers(module, isfunction):
            if function_name.startswith("_"):
                continue
            full_function_name = module_name + "." + function_name
            parameter_names = signature(function).parameters.keys()
            for parameter_name in parameter_names:
                parameter_map[parameter_name].add(full_function_name)

    return parameter_map


parameter_map = get_parameter_map()

# read the docs variable name table
url = 'https://raw.githubusercontent.com/pvlib/pvlib-python/master/pvlib/data/variables_style_rules.csv'
df = pd.read_csv(url, sep=';')
listed_names = set(df['variable'])
unlisted_names = set(parameter_map.keys()) - listed_names
print(unlisted_names)

unlisted_names has over 300 elements. I think it makes sense to leave most of them out of the table, for example function-specific names like upper_line_length and u0, but a few of the unlisted ones are common enough to warrant inclusion, e.g. albedo. It also shows a few inconsistencies like delta_t vs deltaT and poa_irradiance vs poa_global.

An idea, maybe a bad one: once we've figured out which of the unlisted parameter names we don't want to include in the table, put them in an "ignore list" and create a CI check that performs the above scan to check for new parameter names that aren't in the table but maybe should be.

Jul 02 '21 18:07 kandersolar

Add fill_factor to the description list, via #2046.

Jun 30 '24 16:06 echedey-ls

An idea, maybe a bad one: once we've figured out which of the unlisted parameter names we don't want to include in the table, put them in an "ignore list" and create a CI check that performs the above scan to check for new parameter names that aren't in the table but maybe should be.

I really dig this idea. Ensuring consistent naming is one place we're not doing great in and this would be a good help I think.

Jul 05 '24 18:07 AdamRJensen

I understand this PR is talking about updating the user guide Variables and Symbols page entries with what is used in pvlib, but after running the script I also found there are multiple duplicate parameter names in slightly different forms, e.g. temp_mod identified by the script when we already have temp_module in the user guide. There are also inconsistencies between functions that refer to the same thing that may (or may not) already be defined in the user guide, e.g. DC power. Just with a quick scan of the print statement and comparing with what I recall is already in the user guide, I easily count >10 such instances.

Do you guys think it's worth a PR to unify these names? I think it would be worth it at least to make the function parameters consistent with what we already have in the user guide. I could review the list in more detail and make a start on this in a PR. Unifying parameter names that aren't already in the user guide could be a separate piece of work as that may require more discussion to reach a consensus on a name.

Sep 11 '24 17:09 RDaxini

It would be great to see a list of parameter name variants you found.

In the past, a deprecation process has been followed when a parameter name has changed. I don't think we want to rename in one large PR. We've done that in small bites.

#894 and probably other issues are relevant.

Sep 11 '24 17:09 cwhanse

It's a painstaking process cross checking everything (parameter map, user guide, function docstring) so the table below is incomplete, but it gives a few examples 😅 I'd certainly be happy to spend more time combing through the full list of terms/user guide/functions (and making changes) if we are agreed we want to do something about it haha.

I understand there may be some reasons for some of the differences (I haven't tracked every related issue / definition at this stage) but I did check most definitions at least, e.g. "current" is definitely the same as "photocurrent" in this case (not short-circuit current or anything else).

The table below is a summary of what I have found. I hope it helps.

Note this does not include terms that aren't in the user guide that potentially should be, just terms for which there is an overlap either between a function and the user guide, or between functions (with or without a user guide entry)

In user guide	In functions
pac, ac	ac_power
dni_clear	clearsky_dni, dni_clearsky
-	clearsky, clearsky_ghi, ghi_clearsky
temp_module	module_temperature, temp_mod
temp_air	temp, temperature
pdc, dc	dc_power
ghi_extra	extra_radiation
-	kt, clearness_index
photocurrent	current
latitude	lat
longitude	lon
pdc0	nameplate_rating*
-	nameplate_rating*, p_ac_0
poa_global	poa_irradiance
resistance_series	R_s**
resistance_shunt	R_sh
solar_zenith	zenith
wind_speed	wind_speed***
-	time, times

*not sure whether it's ac or dc, not obvious unless you dive into the references/source code, either way there's a still a duplicate somewhere ** series resistance at reference conditions *** this wind_speed is defined at a height of 10 metres whereas no height is specified in the user guide wind_speed

The starred points are a few examples of where terms require further clarification. Some definitions are even misleading (in my opinion) e.g. voltage in pvsystem.scale_voltage_current_power is not a voltage, it's a factor by which the voltage is scaled...

There are also instances where two parameters have a similar meaning but have no overlap, e.g. saturation_current in the user guide and I_o_ref in ivtools.sdm.pvsyst_temperature_coeff for the saturation current at reference conditions. I think part of being consistent across the library involves synchronising parameters like this to have a consistent stem, where possible, although there may good reasons for exceptions to this. Taking this case just as an example, perhaps saturation_current_ref and saturation_current would be suitable, or I_o and I_o_ref.

In the past, a deprecation process has been followed when a parameter name has changed. I don't think we want to rename in one large PR. We've done that in small bites.

Ah of course, I had not thought about the required deprecation process but that does make sense. Good point, thanks for explaining.

Sep 11 '24 22:09 RDaxini

g_poa_effective is another naming convention for poa_global that is used once

Sep 28 '24 21:09 AdamRJensen

These are the most common parameter names unlisted in the glossary:

Details

A total of 469 unique unknown parameters were found. They make for a total of 983 ocurrences.
The most common unknown parameters are:
 - map_variables: 26
 - model: 20
 - filename: 15
 - start: 14
 - end: 14
 - gcr: 13
 - url: 13
 - times: 12
 - x: 11
 - delta_t: 11
 - time: 10
 - altitude: 9
 - zenith: 9
 - method: 9
 - nNsVth: 9
 - min_cos_zenith: 8
 - max_zenith: 8
 - temperature: 8
 - numthreads: 8
 - datetime_or_doy: 7
 - axis_azimuth: 7
 - julian_ephemeris_century: 7
 - timeout: 7
 - api_key: 7
 - d2mutau: 6
 - NsVbi: 6
 - unixtime: 6
 - coefficients: 6
 - b: 5
 - a: 5
 - data: 5
 - current: 5
 - breakdown_factor: 5
 - breakdown_voltage: 5
 - breakdown_exp: 5
 - atmos_refract: 5
 - dayofyear: 5
 - observer_latitude: 5
 - irrad_ref: 5
 - module_efficiency: 5
 - return_components: 4
 - xtol: 4
 - v_dc: 4
 - p_dc: 4
 - voltage: 4
 - axis_tilt: 4
 - true_ecliptic_obliquity: 4
 - lat: 4
 - lon: 4
 - height: 4
 - pitch: 4
 - station: 4
 - coerce_year: 4
 - alpha_sc: 4
 - cells_in_series: 4
 - temp_ref: 4
 - wavelength: 4
 - module_type: 4
 - aod500: 3
 - surface_type: 3
 - normalize: 3
 - inverter: 3
 - model_perez: 3
 - use_delta_kt_prime: 3
 - location: 3
 - name: 3
 - weather: 3
 - clearsky_index: 3
 - slant_height: 3
 - method_kwargs: 3
 - i: 3
 - how: 3
 - horizon: 3
 - declination: 3
 - julian_day: 3
 - jme: 3
 - x0: 3
 - x1: 3
 - longitude_nutation: 3
 - local_hour_angle: 3
 - parallax_sun_right_ascension: 3
 - elev: 3
 - temp: 3
 - sst: 3
 - esd: 3
 - u0: 3
 - u1: 3
 - noct: 3
 - angle: 3
 - number: 3
 - npoints: 3
 - vectorize: 3
 - logical_records: 3
 - fbuf: 3
 - encoding: 3
 - integrated: 3
 - label: 3
 - EgRef: 3
 - coeff: 2
 - aod380: 2
 - alpha: 2
 - lambda1: 2
 - max_iterations: 2
 - ozone: 2
 - a_r: 2
 - module: 2
 - weight: 2
 - ghi_clear: 2
 - max_clearness_index: 2
 - solar_position: 2
 - system: 2
 - clearsky_model: 2
 - airmass_model: 2
 - v: 2
 - rsh: 2
 - snowfall: 2
 - threshold_snowfall: 2
 - rainfall: 2
 - cleaning_threshold: 2
 - rain_accum_period: 2
 - hourangle: 2
 - equation_of_time: 2
 - year: 2
 - month: 2
 - out: 2
 - julian_ephemeris_millennium: 2
 - earth_radius_vector: 2
 - apparent_sun_longitude: 2
 - geocentric_latitude: 2
 - u: 2
 - observer_elevation: 2
 - xterm: 2
 - equatorial_horizontal_parallax: 2
 - geocentric_sun_declination: 2
 - geocentric_sun_right_ascension: 2
 - topocentric_sun_declination: 2
 - topocentric_local_hour_angle: 2
 - deltaT: 2
 - u_c: 2
 - u_v: 2
 - alpha_absorption: 2
 - emissivity: 2
 - transmittance_absorptance: 2
 - wind_fit_low: 2
 - wind_fit_high: 2
 - slope_azimuth: 2
 - slope_tilt: 2
 - max_rows: 2
 - variable_map: 2
 - email: 2
 - outputformat: 2
 - usehorizon: 2
 - userhorizon: 2
 - pvgis_format: 2
 - time_resolution: 2
 - variables: 2
 - beta_voc: 2
 - ivcurves: 2
 - specs: 2
 - const: 2
 - maxiter: 2
 - eps1: 2
 - gamma_ref: 2
 - mu_gamma: 2
 - I_L_ref: 2
 - I_o_ref: 2
 - R_sh_ref: 2
 - R_sh_0: 2
 - R_s: 2
 - R_sh_exp: 2
 - e: 2
 - sr: 2
 - solar_elevation: 1
 - surface_condition: 1
 - color_coeff: 1
 - wave_roughness_coeff: 1
 - aod_bb: 1
 - aod0: 1
 - lambda0: 1
 - aod1: 1
 - aod2: 1
 - lambda2: 1
 - wind_speed_reference: 1
 - height_reference: 1
 - height_desired: 1
 - exponent: 1
 - linke_turbidity: 1
 - perez_enhancement: 1
 - filepath: 1
 - interp_turbidity: 1
 - apparent_elevation: 1
 - aod700: 1
 - measured: 1
 - clearsky: 1
 - infer_limits: 1
 - window_length: 1
 - mean_diff: 1
 - max_diff: 1
 - lower_line_length: 1
 - upper_line_length: 1
 - var_diff: 1
 - slope_dev: 1
 - asymmetry: 1
 - n: 1
 - K: 1
 - L: 1
 - c1: 1
 - c2: 1
 - theta_ref: 1
 - iam_ref: 1
 - upper: 1
 - function: 1
 - region: 1
 - num: 1
 - source_name: 1
 - source_params: 1
 - target_name: 1
 - fix_n: 1
 - measured_aoi: 1
 - measured_iam: 1
 - model_name: 1
 - target_param: 1
 - vtol: 1
 - ac_power: 1
 - dc_power: 1
 - dc_voltage: 1
 - dc_voltage_level: 1
 - p_ac_0: 1
 - p_nt: 1
 - c: 1
 - x_d: 1
 - add: 1
 - solar_constant: 1
 - epoch_year: 1
 - projection_ratio: 1
 - full_output: 1
 - max_clearsky_index: 1
 - extra_radiation: 1
 - clearness_index: 1
 - max_airmass: 1
 - calculate_gt_90: 1
 - a_coeff: 1
 - b_coeff: 1
 - transmittance: 1
 - clearsky_tolerance: 1
 - zenith_threshold_for_zero_dni: 1
 - zenith_threshold_for_clearsky_limit: 1
 - daily_solar_zenith: 1
 - global_diffuse_fraction: 1
 - tmy_metadata: 1
 - tmy_data: 1
 - metadata: 1
 - strategy: 1
 - transposition_model: 1
 - solar_position_method: 1
 - k_a: 1
 - k_d: 1
 - tc_d: 1
 - k_rs: 1
 - k_rsh: 1
 - eta: 1
 - dict_output: 1
 - temp_mod: 1
 - k: 1
 - cell_type: 1
 - xdata: 1
 - path: 1
 - reference_irradiance: 1
 - resistance: 1
 - index: 1
 - iam_model: 1
 - pw: 1
 - positions: 1
 - cloud_speed: 1
 - dt: 1
 - coordinates: 1
 - masking_angle: 1
 - shaded_row_rotation: 1
 - shaded_fraction: 1
 - shaded_blocks: 1
 - total_blocks: 1
 - diode_voltage: 1
 - gradients: 1
 - voc: 1
 - iph: 1
 - isat: 1
 - rs: 1
 - gamma: 1
 - poa_irradiance: 1
 - initial_coverage: 1
 - can_slide_coefficient: 1
 - slide_amount_coefficient: 1
 - snow_coverage: 1
 - num_strings: 1
 - snow_total: 1
 - snow_events: 1
 - lower_edge_height: 1
 - string_factor: 1
 - angle_of_repose: 1
 - pm2_5: 1
 - pm10: 1
 - depo_veloc: 1
 - soiling_loss_rate: 1
 - grace_period: 1
 - max_soiling: 1
 - manual_wash_dates: 1
 - initial_soiling: 1
 - raw_spa_output: 1
 - next_or_previous: 1
 - lower_bound: 1
 - upper_bound: 1
 - attribute: 1
 - value: 1
 - thetime: 1
 - target: 1
 - attr: 1
 - day: 1
 - hour: 1
 - minute: 1
 - second: 1
 - microsecond: 1
 - julian_ephemeris_day: 1
 - arr: 1
 - heliocentric_longitude: 1
 - heliocentric_latitude: 1
 - x2: 1
 - x3: 1
 - x4: 1
 - mean_ecliptic_obliquity: 1
 - obliquity_nutation: 1
 - geocentric_longitude: 1
 - aberration_correction: 1
 - julian_century: 1
 - mean_sidereal_time: 1
 - apparent_sidereal_time: 1
 - observer_longitude: 1
 - sun_right_ascension: 1
 - yterm: 1
 - local_pressure: 1
 - local_temp: 1
 - topocentric_elevation_angle_wo_atmosphere: 1
 - topocentric_elevation_angle_without_atmosphere: 1
 - atmospheric_refraction_correction: 1
 - topocentric_elevation_angle: 1
 - topocentric_astronomers_azimuth: 1
 - sun_mean_longitude: 1
 - loc_args: 1
 - dates: 1
 - module_temperature: 1
 - ir_down: 1
 - sky_view: 1
 - noct_installed: 1
 - module_height: 1
 - wind_height: 1
 - absorption: 1
 - module_width: 1
 - module_length: 1
 - array_height: 1
 - mount_standoff: 1
 - unit_mass: 1
 - u_const: 1
 - du_wind: 1
 - absorptance: 1
 - djd: 1
 - apparent_azimuth: 1
 - max_angle: 1
 - backtrack: 1
 - cross_axis_tilt: 1
 - tracker_theta: 1
 - input_power: 1
 - no_load_loss: 1
 - load_loss: 1
 - transformer_rating: 1
 - iam: 1
 - iam_front: 1
 - iam_back: 1
 - bifaciality: 1
 - shade_factor: 1
 - transmission_factor: 1
 - rmad: 1
 - fill_factor: 1
 - fill_factor_reference: 1
 - timestamps: 1
 - pvrow_height: 1
 - pvrow_width: 1
 - n_pvrows: 1
 - index_observed_pvrow: 1
 - rho_front_pvrow: 1
 - rho_back_pvrow: 1
 - horizon_band_angle: 1
 - pvarray: 1
 - rotation: 1
 - grid: 1
 - trace_val: 1
 - latitude_range: 1
 - longitude_range: 1
 - username: 1
 - password: 1
 - save_path: 1
 - csvdata: 1
 - raw_data: 1
 - site: 1
 - names: 1
 - interval: 1
 - attributes: 1
 - leap_day: 1
 - full_name: 1
 - affiliation: 1
 - raddatabase: 1
 - components: 1
 - pvcalculation: 1
 - peakpower: 1
 - pvtechchoice: 1
 - mountingplace: 1
 - loss: 1
 - trackingtype: 1
 - optimal_surface_tilt: 1
 - optimalangles: 1
 - startyear: 1
 - endyear: 1
 - roll_utc_offset: 1
 - identifier: 1
 - time_step: 1
 - time_ref: 1
 - verbose: 1
 - server: 1
 - source: 1
 - spatial_resolution: 1
 - true_dynamics: 1
 - probability_of_exceedance: 1
 - missing_data: 1
 - timestamp_type: 1
 - terrain_shading: 1
 - duration: 1
 - filetype: 1
 - recolumn: 1
 - v_mp_i_mp: 1
 - vlim: 1
 - ilim: 1
 - celltype: 1
 - gamma_pmp: 1
 - dEgdT: 1
 - init_guess: 1
 - root_kwargs: 1
 - rshexp: 1
 - ee: 1
 - e0: 1
 - decimals: 1
 - imax_limits: 1
 - vmax_limits: 1
 - voc_points: 1
 - isc_points: 1
 - mp_fit_order: 1
 - wavelengths: 1
 - standard: 1
 - e_sun: 1
 - e_ref: 1
 - min_precipitable_water: 1
 - max_precipitable_water: 1
 - min_airmass_absolute: 1
 - max_airmass_absolute: 1
 - qe: 1
 - ground_albedo: 1
 - surface_pressure: 1
 - relative_airmass: 1
 - aerosol_turbidity_500nm: 1
 - scattering_albedo_400nm: 1
 - wavelength_variation_factor: 1
 - aerosol_asymmetry_factor: 1

Code (free of charge)

#!/usr/bin/env python3
"""
Utility script to ensure that the parameters in the public API of pvlib-python
are normalized. By normalized, we mean that the parameters are either in the
documentation glossary or in the exclusion list.

This script is intended to be run in the CI pipeline to ensure that the
parameters are normalized before merging a PR.

Author: Echedey :D
"""

# %%
from pathlib import Path
import re

import ast

# Paths setup
REPO_ROOT = Path(__file__).parent.parent
# Path to pvlib directory
PVLIB_DIR = REPO_ROOT / "pvlib"

# Path to the glossary
GLOSSARY_PATH = REPO_ROOT / "docs/sphinx/source/user_guide/nomenclature.rst"


# %%
def get_glossary_terms(glossary_rst_file) -> list[str]:
    """
    Returns the list of defined terms in the glossary.

    Parameters
    ----------
    glossary_rst_file : Path
        Path to the glossary file.

    Returns
    -------
    list[str]
        List of defined terms in the glossary.

    Notes
    -----
    The terms of the glossary are defined in the nomenclature section of the
    documentation.
    """
    with open(glossary_rst_file, mode="r", encoding="utf-8") as f:
        # first, trim everything before the '.. glossary::' directive
        while ".. glossary::" not in f.readline():
            pass

        # then, read the glossary terms
        #   each term is indented by 4 spaces
        #   more than one can appear in a line, separated by commas
        #   the glossary ends with two newlines
        glossary_terms = []
        term_regex = re.compile(r"^ {4}\w+")  # match a line with a term
        for line in f:
            if line == "\n":  # a newline is found
                if (line := f.readline()) == "\n":  # is it a double newline?
                    break  # then the glossary ends
            # otherwise, it is a text line
            if term_regex.match(line):  # is it a glossary term?
                # remove the leading spaces and the trailing newline
                glossary_terms.extend(line.strip().split(", "))
    return glossary_terms


# %%
def get_parameters_to_exclude() -> list[str]:
    """
    Returns the list of parameters to exclude from the normalization check.
    These include python-specific parameters, like 'self', 'cls', etc.
    And also parameters that are not in the glossary but are used in the code.
    """
    return [
        "self",
        "cls",
        "kwargs",
    ]  # list of parameters to exclude from the normalization check


# %%
def get_files_to_check() -> list[Path]:
    """
    Returns the list of files to check for normalized parameters.
    These files are relative to the pvlib package root directory.
    """
    files_to_check = list(PVLIB_DIR.glob("**/*.py"))
    files_to_exclude = [
        # non-public API files, __init__.py, _deprecation.py
        *PVLIB_DIR.glob("**/_*.py"),
        *PVLIB_DIR.glob("tests/**/*.py"),  # test files
        *PVLIB_DIR.glob("spa_c_files/*.py"),  # spa c submodule
    ]

    files_to_check = [
        file.relative_to(PVLIB_DIR)
        for file in files_to_check
        if file not in files_to_exclude
    ]
    return files_to_check


# %%
if __name__ == "__main__":
    # Get the glossary terms
    glossary_terms = get_glossary_terms(GLOSSARY_PATH)

    # Get the parameters to exclude and append them to the glossary terms
    allowed_parameter_names = get_parameters_to_exclude()
    allowed_parameter_names = set(glossary_terms + allowed_parameter_names)

    # Get the list of files to check
    files_to_check = get_files_to_check()

    # tree of files, with the functions and their line number and params
    # {file_path: {function_name: (line_number, [parameter_names])}}
    tree: dict[Path, dict[tuple[str, int], list[str]]] = {}
    for file_path in files_to_check:
        with open(PVLIB_DIR / file_path, mode="r", encoding="utf-8") as f:
            tree[file_path] = {}  # initialize the file tree
            parsed_file = ast.parse(f.read())
            for node in ast.walk(parsed_file):
                if isinstance(node, ast.FunctionDef):
                    if node.name.startswith("_"):
                        continue  # skip private functions
                    tree[file_path][node.name] = (
                        node.lineno,
                        [arg.arg for arg in node.args.args],
                    )

    # Check if the parameters are normalized
    parameter_names_violations = 0
    parameter_names_violated = []
    for file_path, functions in tree.items():
        for function_name, (lineno, parameters) in functions.items():
            for parameter in parameters:
                if parameter not in allowed_parameter_names:
                    print(
                        f"Unknown {parameter} in {function_name}: "
                        + f"in {file_path}:{lineno}"
                    )
                    parameter_names_violations += 1
                    parameter_names_violated.append(parameter)

    if parameter_names_violations > 0:
        # count each unique unknown parameter into a dictionary
        parameter_names_violated: dict[str, int] = {
            parameter: parameter_names_violated.count(parameter)
            for parameter in parameter_names_violated
        }
        print("The unique unknown parameters are:")
        for parameter in parameter_names_violated:
            print(f" - {parameter}")
        print(f"A total of {len(parameter_names_violated)} unique unknown "
              + "parameters were found. They make for a total of "
              + f"{parameter_names_violations} ocurrences.")
        # order by most common unknown parameter
        print("The most common unknown parameters are:")
        for parameter in sorted(
            parameter_names_violated,
            key=parameter_names_violated.get,
            reverse=True,
        ):
            print(f" - {parameter}: {parameter_names_violated[parameter]}")
        exit(1)

    print("All parameters are consistent.")
    exit(0)

# %%

Edit to add: we can do an online spreadsheet, discuss whether they are appropriate for the glossary and/or fill the definitions there.

Jan 10 '25 20:01 echedey-ls

@jsstein see @echedey-ls list of parameter names.

Jan 10 '25 21:01 cwhanse