Potential bug with solarposition.py
Describe the bug I am upgrading from pvlib version 0.9.5 and have noticed an issue when using the sun_rise_set_transit_spa() function to label nighttime points. I have attached three screenshot, one demonstrating when the function is working correctly using pvlib0.9.5 (labeled 'correctly labeled nighttime points.png'), one showing the issue using pvlib0.13.0 (labeled 'incorrectly_labeled_nighttime_points.png'), along with a zoomed in screenshot of the issue ('incorrectly_labeled_nighttime_points_zoomed_in.png'). I am trying to use the function to label the nighttime points for later filtering.
I am able to edit two lines of the code from pvlib0.13.0 to get the correct behavior. I don't fully understand the issue, but if I comment out these two lines: (lines 448-450 in solarposition.py)
# must convert to midnight UTC on day of interest
# times_utc = times.tz_convert('UTC')
# unixtime = _datetime_to_unixtime(times_utc.normalize())
and instead use the unixtime conversion used in pvlib 0.9.5:
# use pvlib095 conversion
times_utc = pd.DatetimeIndex(times.date).tz_localize('UTC')
unixtime = np.array(times_utc.view(np.int64)/10**9)"
I end up with the plots that look correct.
To Reproduce Steps to reproduce the behavior: I think you should be able to reproduce the problem by creating a nighttime filter on a dataset and visualizing the points labeled as nighttime points and see if it looks like the function properly flagged nighttime points.
Expected behavior I expect the sun_rise_set_transit_spa() function to properly create a flag that label points that occur during the nighttime as nighttime points.
Function call I am using to generate the flag:
def night_flag(data, meta):
'''
Goal: flag all points that are nighttime points (before sunrise and after sunset). Returns a flag of True or False
values for each timestamp in the index where True means it is a nighttime point and False means it is not.
Parameters
- data: pandas dataframe where the index is the timestamp
- meta: pandas array (column) of site specific information
Returns
- boolean flag
'''
times=pvlib.solarposition.sun_rise_set_transit_spa(data.index, meta['latitude'], meta['longitude'])
return ((data.index < times['sunrise']) | (data.index > times['sunset']))
Screenshots
If applicable, add screenshots to help explain your problem.
See attached screenshots.
Zoomed in view of issue.
Zoomed out view of issue.
Zoomed out view of correct performance that is achieved by editing the above two lines to the old 0.9.5 version.
Versions:
-
pvlib.__version__: 0.13.0 -
pandas.__version__: 2.3.1 - python: 3.13.5
Additional context It seems that just adjusting a couple lines to the old version fixes the issue for me. I am not sure what exactly about the new code for determining the unixtime variable causes the problem. I'll be copying the solarposition code to my local code and making the edit and just calling the modified function for the time being, but figured it would be good to create an issue. Thanks!
The zoomed out view of the issue suggests the errors occur during dayling savings time periods - looks like errors begin in the spring and go away in the fall. Hard to be certain though.
A reproducible example would go a long way here. There are some 8760 TMY weather files in tests/data, if one of those would serve to show us what is happening.
Pretty sure this is #2238. The TZ is 6 hours off UTC, which means 6 hours before midnight, the calculated sunrise and sunset (in pvlib>=0.11.1) jumps ahead to the next day, so data.index < times['sunrise'] is True and those 6 hours get marked as night. The DST thing is a red herring; the error shows up in summer because that's when days are long enough for it to still be daytime 6 hours before midnight.
#2238 has some proposed fixes, but I got lost wrapping my head around it all, and then it fell off my radar. PR welcome!
Post #2238 is a dense one... is potentially a quick fix solution to add an additional parameter to functions that do this timezone conversion to use either local time or utc time?
Example:
def sun_rise_set_transit_spa(..., use_local_date=True)
....
if use_local_date == True: # use pvlib 0.11.0 conversion
times_utc = pd.DatetimeIndex(times.date).tz_localize('UTC')
unixtime = np.array(times_utc.view(np.int64)/10**9)
else: # use pvlib >= 0.11.1 conversion
# must convert to midnight UTC on day of interest
times_utc = times.tz_convert('UTC')
unixtime = _datetime_to_unixtime(times_utc.normalize())
...
I don't think I understand the nuances of the issue enough to draft a great solution, but the above preserves the old functionality that seems to fit my use case best while also allowing users to opt for the new functionality if they are sticking to UTC times.
add an additional parameter ... to use either local time or utc time?
I was also thinking about this option.
The larger issue, is that pvlib uses functions from these foreign libraries for sunrise, sunset, etc., and those foreign libraries have idiosyncrasies. It may be easier to just replace those uses with pvlib native code, and determine sunrise/sunset from solar zenith.
Hi, I am sorry for confusion but I think this is not a bug in the library and I could fix your function night_flag to work as expected.
Before:
return ((data.index < times['sunrise']) | (data.index > times['sunset']))
After:
return ((data.index.time < times['sunrise'].dt.time) | (data.index.time > times['sunset'].dt.time))
In other words, the modifications you suggested as pvlib095 and pvlib0130sun might be overkills that skip a prerequisite policy decision on the date part of the results as explained in #2238. Briefly, spa_c results in only hours or times differing on UTC midnights but without any dates. In contrast, sun_rise_set_transit_spa from v0.11.0 returns datetimes but their times change on midnights following the input timezone unlike the UTC times from spa_c. #2055 fixes this problem in v0.11.1 by making UTC datetimes and convert their timezone if needed. Conversely, sun_rise_set_transit_ephem creates local datetimes and translates their timezone but also ensures they are in the future. Lastly, sun_rise_set_transit_geometric generates UTC datetimes but overwrites the date parts with those from each local input, leading to discrepancies like two sunrises expected for one day. You can test these in your demo code like this:
times = pvlib.solarposition.sun_rise_set_transit_ephem(data.index, meta['latitude'], meta['longitude'])
utcday = data.index.tz_convert('UTC').dayofyear
eot = pvlib.solarposition.equation_of_time_spencer71(utcday) # minutes
decl = pvlib.solarposition.declination_spencer71(utcday) # radians
times2 = pvlib.solarposition.sun_rise_set_transit_geometric(data.index, meta['latitude'], meta['longitude'], decl, eot)
transit = times2[2].tolist()
sunrise = times2[0].tolist()
sunset = times2[1].tolist()
times = pd.DataFrame(index=data.index, data={'sunrise': sunrise,
'sunset': sunset,
'transit': transit})
For example, your previous expression returned from night_flag is evaluated in sun_rise_set_transit_ephem as True all the time because of the guaranteed future of the date parts:
On the contrary, sun_rise_set_transit_geometric refers to the input dates and looks like no problem but there is a problem actually that you cannot tell which sunrise to choose from the two times exisiting on a day: