hydropandas Plot observations per hydrological year

trafficstars

I need plots of my long term observations per year. So x-axis is one hydrological year (starting 1 April, or user defined date); and each observation year has a different color.

First question: Is it okay to include this in HydroPandas?

Secondly if yes, I thought of two options to do this. Any advise on this?

Add new column to obs-collection; that has date/time with a dummy year e.g. 1900 (for period 1 April - 31 Dec); and dummy year 1901 (for period 1 Jan - 31 March). Function can use pandas/matplotlib x-axis formatting power for plotting times and dates. Only remove the year from the x-axis labels. This has my preference
Add new column to obs-collection; that has the date number since 1 April. Function has to change the x-axis labels after plotting from datenumbers to usefull dates or months. Then we cannot use the pandas/matplotlib power in this.

May 02 '24 07:05 HMEUW

I previously used something similar to option 1, which may serve as inspiration. It basically calculates the julian date since 1st of January of the current year, but should be possible to be used with any other starting date as well. Works with higher frequency than daily data.

df['year'] = df['date'].dt.year
for i, row in df.iterrows():
    df.loc[i, 'doy']= row['date'].to_julian_date() - row['date'].replace(month=1, day=1, hour=0, minute=0, second=0).to_julian_date()

May 02 '24 07:05 MattBrst

I'm all for more pretty plots, and this sounds like useful plot to make.

As for the implementation, I would keep it a bit simpler using groupby:

obs  # my obs
gr = obs[column].groupby(by=obs.index.year)
fig, ax = plt.subplots()
for year, group in gr:
    ax.plot(group.index.dayofyear, group.values, label=year)

# some code to set to nicely set the date labels using DateFormatter or something along those lines
ax.set_xticklabels(...)

There's always the question of how to handle leap-years, but that's just a choice you have to make.

EDIT: the code above doesnt work for higher frequency data than daily. In that case you have to compute the index another way (not tested but an idea off the top of my head): tidx = ref_date + (group.index - group.index[0].round("YS"))

May 02 '24 07:05 dbrakenhoff

I mixed up the suggestions of @MattBrst and @dbrakenhoff. I have working code now. Any suggestions?

import matplotlib.dates as mdates

# start of the hydrological year, here choose 1 November
month0 = 11
day0 = 1

df = oc_plot.iloc[0].obs

# first proces the first calendar year in the requested hydrological year
# for simplicity assign to all data, repair for the second year later

# create column with legend label
df['plot_year'] = df.index.year
# create x-values for plotting
# TO DO: gives PerformanceWarning
df['plot_x'] = df.index + pd.offsets.DateOffset(year=1900)

# overwrite assigned values for dates before month0 and day0

for year in range(df.index.year.min(), df.index.year.max()+1):
    # these belong to the previous hydrological year, so change legend label
    df.loc[
        (df.index >= pd.Timestamp(year, 1, 1)) &
        (df.index < pd.Timestamp(year, month0, day0)), 'plot_year'] = year-1
    # assign year 1900+1 in plotting index
    df.loc[
        (df.index >= pd.Timestamp(year, 1, 1)) &
        (df.index < pd.Timestamp(year, month0, day0)), 'plot_x'] += pd.offsets.DateOffset(year=1901)

# plotting
gr = df.groupby(by=df.plot_year)
fig, ax = plt.subplots()
for plot_year, group in gr:
    ax.plot(group.plot_x, group.stand_m_tov_nap, label=plot_year)

ax.legend()
ax.grid()
ax.set_xlim([pd.Timestamp(1900, month0, day0),  pd.Timestamp(1901, month0, day0)])

test

You are looking to observed groundwater levels in a dike. Steep rise since 1 November is interesting. Apart from minor change between annual maximum water levels despite signifcant difference in rainfall.

May 03 '24 14:05 HMEUW

In SPEI I have a function that does something similar:

https://github.com/martinvonk/SPEI/blob/a422933cb6b98605e143aac846c2374af390afb2/src/spei/utils.py#L78-L92

from pandas import Grouper
from pandas import __version__ as pd_version
from pandas import concat, to_datetime

def group_yearly_df(series: Series) -> DataFrame:
    """Group series in a DataFrame with date (in the year 2000) as index and
    year as columns.
    """
    strfstr: str = "%m-%d %H:%M:%S"
    grs = {}
    freq = "YE" if pd_version >= "2.2.0" else "Y"
    for year_timestamp, gry in series.groupby(Grouper(freq=freq)):
        gry.index = to_datetime(
            "2000-" + gry.index.strftime(strfstr), format="%Y-" + strfstr
        )
        year = getattr(year_timestamp, "year")  # type: str
        grs[year] = gry
    return concat(grs, axis=1)

May 07 '24 14:05 martinvonk

Thanks for sharing the snippet. You have some more Python-ic code than my for-loop.

May 07 '24 14:05 HMEUW

hydropandas hydropandas copied to clipboard

Plot observations per hydrological year

hydropandas
hydropandas copied to clipboard