pvlib-python icon indicating copy to clipboard operation
pvlib-python copied to clipboard

More descriptive errors for `detect_clearsky`

Open mdeceglie opened this issue 10 months ago • 4 comments

Is your feature request related to a problem? Please describe. Using a window length that is too short relative to data period in detect_clearsky produces cryptic errors

Describe the solution you'd like Raise a ValueError that directly explains the problem

Example:

import pvlib
import pandas as pd
import numpy as np
start = '2012-01-01'
end = '2015-01-01'
freq = '60T'
times = pd.date_range(start=start, end=end, freq=freq)
x1 = pd.Series(np.random.rand(len(times)), index=times)
x2 = pd.Series(np.random.rand(len(times)), index=times)

pvlib.clearsky.detect_clearsky(x1, x2,
                          window_length=90, mean_diff=75, max_diff=75,
                          lower_line_length=-45, upper_line_length=80,
                          var_diff=0.032, slope_dev=75)

Results in the following:

ValueError                                Traceback (most recent call last)
Cell In[28], line 11
      8 x1 = pd.Series(np.random.rand(len(times)), index=times)
      9 x2 = pd.Series(np.random.rand(len(times)), index=times)
---> 11 pvlib.clearsky.detect_clearsky(x1, x2,
     12                           window_length=90, mean_diff=75, max_diff=75,
     13                           lower_line_length=-45, upper_line_length=80,
     14                           var_diff=0.032, slope_dev=75)

File ~/opt/anaconda3/envs/rdtools3_testing/lib/python3.10/site-packages/pvlib/clearsky.py:854, in detect_clearsky(measured, clearsky, times, infer_limits, window_length, mean_diff, max_diff, lower_line_length, upper_line_length, var_diff, slope_dev, max_iterations, return_components)
    850 clear_line_length = _line_length_windowed(
    851     scaled_clear, H, samples_per_window, sample_interval)
    853 line_diff = meas_line_length - clear_line_length
--> 854 slope_max_diff = _max_diff_windowed(
    855     meas - scaled_clear, H, samples_per_window)
    856 # evaluate comparison criteria
    857 c1 = np.abs(meas_mean - alpha*clear_mean) < mean_diff

File ~/opt/anaconda3/envs/rdtools3_testing/lib/python3.10/site-packages/pvlib/clearsky.py:602, in _max_diff_windowed(data, H, samples_per_window)
    600 def _max_diff_windowed(data, H, samples_per_window):
    601     raw = np.diff(data)
--> 602     raw = np.abs(raw[H[:-1, ]]).max(axis=0)
    603     return _to_centered_series(raw, data.index, samples_per_window)

File ~/opt/anaconda3/envs/rdtools3_testing/lib/python3.10/site-packages/numpy/core/_methods.py:40, in _amax(a, axis, out, keepdims, initial, where)
     38 def _amax(a, axis=None, out=None, keepdims=False,
     39           initial=_NoValue, where=True):
---> 40     return umr_maximum(a, axis, None, out, keepdims, initial, where)

ValueError: zero-size array to reduction operation maximum which has no identity

And the following produces a different error, even though the window_length > period as required by the docstring (Does the window length need to be a multiple of the period?)

import pvlib
import pandas as pd
import numpy as np
start = '2012-01-01'
end = '2015-01-01'
freq = '60T'
times = pd.date_range(start=start, end=end, freq=freq)
x1 = pd.Series(np.random.rand(len(times)), index=times)
x2 = pd.Series(np.random.rand(len(times)), index=times)

pvlib.clearsky.detect_clearsky(x1, x2,
                          window_length=175, mean_diff=75, max_diff=75,
                          lower_line_length=-45, upper_line_length=80,
                          var_diff=0.032, slope_dev=75)

result:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[32], line 11
      8 x1 = pd.Series(np.random.rand(len(times)), index=times)
      9 x2 = pd.Series(np.random.rand(len(times)), index=times)
---> 11 pvlib.clearsky.detect_clearsky(x1, x2,
     12                           window_length=175, mean_diff=75, max_diff=75,
     13                           lower_line_length=-45, upper_line_length=80,
     14                           var_diff=0.032, slope_dev=75)

File ~/opt/anaconda3/envs/rdtools3_testing/lib/python3.10/site-packages/pvlib/clearsky.py:891, in detect_clearsky(measured, clearsky, times, infer_limits, window_length, mean_diff, max_diff, lower_line_length, upper_line_length, var_diff, slope_dev, max_iterations, return_components)
    885     except AttributeError:
    886         message = "Optimizer exited unsuccessfully: \
    887                    No message explaining the failure was returned. \
    888                    If you would like to see this message, please \
    889                    update your scipy version (try version 1.8.0 \
    890                    or beyond)."
--> 891     raise RuntimeError(message)
    893 else:
    894     alpha = optimize_result.x

RuntimeError: Optimizer exited unsuccessfully: NaN result encountered.

mdeceglie avatar Apr 09 '24 01:04 mdeceglie

Thanks @mdeceglie. I've had on my list for a while to implement this extension of the detect clearsky algorithm. In the meantime, PRs welcome to improve the error detection/docstrings.

cwhanse avatar Apr 09 '24 13:04 cwhanse

@cwhanse do you have insights into the second example? Is the requirement specifically that the window length must be greater than the period and a multiple of the period? I am speculating on the nature of that problem.

mdeceglie avatar Apr 09 '24 15:04 mdeceglie

In the second case: the window contains 2 data values. Buried in the detect_clearsky algorithm is the calculation of a sample standard deviation of slopes between points in an interval. With window length 175 and data frequency of 60, there are two points per interval, hence one slope, hence the divisor in the standard deviation is N-1 = 0. That returns nan for the standard deviation which then results in no clear points detected, and subsequent failure of the optimizer.

I think the fix is either to edit the docstring for window_length to say "at least 3 periods". Or, we can build a fallback into that calculation of slope std. dev. in the case where there's only 1 slope per interval.

cwhanse avatar Apr 09 '24 15:04 cwhanse