hydropandas
hydropandas copied to clipboard
Plot observations per hydrological year
I need plots of my long term observations per year. So x-axis is one hydrological year (starting 1 April, or user defined date); and each observation year has a different color.
First question: Is it okay to include this in HydroPandas?
Secondly if yes, I thought of two options to do this. Any advise on this?
- Add new column to obs-collection; that has date/time with a dummy year e.g. 1900 (for period 1 April - 31 Dec); and dummy year 1901 (for period 1 Jan - 31 March). Function can use pandas/matplotlib x-axis formatting power for plotting times and dates. Only remove the year from the x-axis labels. This has my preference
- Add new column to obs-collection; that has the date number since 1 April. Function has to change the x-axis labels after plotting from datenumbers to usefull dates or months. Then we cannot use the pandas/matplotlib power in this.
I previously used something similar to option 1, which may serve as inspiration. It basically calculates the julian date since 1st of January of the current year, but should be possible to be used with any other starting date as well. Works with higher frequency than daily data.
df['year'] = df['date'].dt.year
for i, row in df.iterrows():
df.loc[i, 'doy']= row['date'].to_julian_date() - row['date'].replace(month=1, day=1, hour=0, minute=0, second=0).to_julian_date()
I'm all for more pretty plots, and this sounds like useful plot to make.
As for the implementation, I would keep it a bit simpler using groupby:
obs # my obs
gr = obs[column].groupby(by=obs.index.year)
fig, ax = plt.subplots()
for year, group in gr:
ax.plot(group.index.dayofyear, group.values, label=year)
# some code to set to nicely set the date labels using DateFormatter or something along those lines
ax.set_xticklabels(...)
There's always the question of how to handle leap-years, but that's just a choice you have to make.
EDIT: the code above doesnt work for higher frequency data than daily. In that case you have to compute the index another way (not tested but an idea off the top of my head): tidx = ref_date + (group.index - group.index[0].round("YS"))
I mixed up the suggestions of @MattBrst and @dbrakenhoff. I have working code now. Any suggestions?
import matplotlib.dates as mdates
# start of the hydrological year, here choose 1 November
month0 = 11
day0 = 1
df = oc_plot.iloc[0].obs
# first proces the first calendar year in the requested hydrological year
# for simplicity assign to all data, repair for the second year later
# create column with legend label
df['plot_year'] = df.index.year
# create x-values for plotting
# TO DO: gives PerformanceWarning
df['plot_x'] = df.index + pd.offsets.DateOffset(year=1900)
# overwrite assigned values for dates before month0 and day0
for year in range(df.index.year.min(), df.index.year.max()+1):
# these belong to the previous hydrological year, so change legend label
df.loc[
(df.index >= pd.Timestamp(year, 1, 1)) &
(df.index < pd.Timestamp(year, month0, day0)), 'plot_year'] = year-1
# assign year 1900+1 in plotting index
df.loc[
(df.index >= pd.Timestamp(year, 1, 1)) &
(df.index < pd.Timestamp(year, month0, day0)), 'plot_x'] += pd.offsets.DateOffset(year=1901)
# plotting
gr = df.groupby(by=df.plot_year)
fig, ax = plt.subplots()
for plot_year, group in gr:
ax.plot(group.plot_x, group.stand_m_tov_nap, label=plot_year)
ax.legend()
ax.grid()
ax.set_xlim([pd.Timestamp(1900, month0, day0), pd.Timestamp(1901, month0, day0)])
You are looking to observed groundwater levels in a dike. Steep rise since 1 November is interesting. Apart from minor change between annual maximum water levels despite signifcant difference in rainfall.
In SPEI I have a function that does something similar:
https://github.com/martinvonk/SPEI/blob/a422933cb6b98605e143aac846c2374af390afb2/src/spei/utils.py#L78-L92
from pandas import Grouper
from pandas import __version__ as pd_version
from pandas import concat, to_datetime
def group_yearly_df(series: Series) -> DataFrame:
"""Group series in a DataFrame with date (in the year 2000) as index and
year as columns.
"""
strfstr: str = "%m-%d %H:%M:%S"
grs = {}
freq = "YE" if pd_version >= "2.2.0" else "Y"
for year_timestamp, gry in series.groupby(Grouper(freq=freq)):
gry.index = to_datetime(
"2000-" + gry.index.strftime(strfstr), format="%Y-" + strfstr
)
year = getattr(year_timestamp, "year") # type: str
grs[year] = gry
return concat(grs, axis=1)
Thanks for sharing the snippet. You have some more Python-ic code than my for-loop.