polars icon indicating copy to clipboard operation
polars copied to clipboard

Business date/calendar logic

Open erinov1 opened this issue 2 years ago • 1 comments

Problem description

It would nice to natively support certain business date operations. In particular, I envision polars expression counterparts versions of the vectorized numpy busday_offset, is_busday, busday_count functions. The numpy versions of these functions accept weekmask and holidays arguments (or a busdaycalendar object, which just stores a weekmask and list of holidays).

Given a dataframe with a date column, you could do something like

weekmask = "1111100"
holidays = [datetime.date(2000, 1, 1)]

df.with_columns(
    [
        pl("date").busday_offset(3, weekmask=weekmask, holidays=holidays),
        pl("date").is_busday(weekmask=weekmask, holidays=holidays),
        pl.busday_count(
            pl.col("date"), 
            datetime.date(2000, 1, 2), 
            weekmask=weekmask, 
            holidays=holidays
        )
    ]
)

It would also be nice to have a bdate_range function/expression so you could get all business dates between two columns by

df.select(
    pl.bdate_range(pl.col("start_date"), pl.col("end_date"), weekmask=weekmask, holidays=holidays)
)

erinov1 avatar Dec 03 '22 16:12 erinov1

Related issue, but specific for upsampling: #5516.

zundertj avatar Dec 23 '22 11:12 zundertj

+1 to have these features.

Also to note that this answer in this stackoverflow answer does not seem to do the right thing either for np.busday_count.

import polars as pl
import numpy as np

df = pl.DataFrame(
    {
        "Day1": [
            "2022-01-02",
            "2022-01-03",
            "2022-01-04",
        ],
        "Day2": [
            "2022-01-03",
            "2022-01-04",
            "2022-01-05",
        ],
    }
).with_columns(pl.col(["Day1", "Day2"]).str.strptime(pl.Date, "%Y-%m-%d"))

print(
    (
        df.with_columns(
            pl.struct([pl.col("Day1"), pl.col("Day2")])
            .map(
                lambda x: np.busday_count(
                    x.struct["Day1"], x.struct["Day2"], weekmask="1110000"
                )
            )
            .alias("Result")
        )
    )
)
shape: (6, 5)
┌────────┬──────────┬────────────┬────────────┬─────────────┐
│ Market ┆ Service  ┆ Day1       ┆ Day2       ┆ Result      │
│ ---    ┆ ---      ┆ ---        ┆ ---        ┆ ---         │
│ str    ┆ str      ┆ date       ┆ date       ┆ list[i64]   │
╞════════╪══════════╪════════════╪════════════╪═════════════╡
│ AT     ┆ Standard ┆ 2022-01-02 ┆ 2022-01-03 ┆ [0, 1, … 0] │
│ DE     ┆ Express  ┆ 2022-01-03 ┆ 2022-01-04 ┆ [0, 1, … 0] │
│ AT     ┆ Standard ┆ 2022-01-04 ┆ 2022-01-05 ┆ [0, 1, … 0] │
│ CZ     ┆ Standard ┆ 2022-01-05 ┆ 2022-01-06 ┆ [0, 1, … 0] │
│ GB     ┆ Standard ┆ 2022-01-06 ┆ 2022-01-07 ┆ [0, 1, … 0] │
│ CZ     ┆ Standard ┆ 2022-01-07 ┆ 2022-01-08 ┆ [0, 1, … 0] │
└────────┴──────────┴────────────┴────────────┴─────────────┘

And passing the expressions directly into the numpy functions (e.g. np.busday_count) does not work either.

df.select(np.busday_count(pl.col("Day1"), pl.col("Day2"), weekmask="1110000"))

ValueError: Could not convert object to NumPy datetime

wangkev avatar Apr 01 '23 01:04 wangkev

The error you are seeing is a Numpy limitation, busday_count is not a ufunc, see the ufunc list of functions.

Also, right now we do not support ufunc's with more than one expression. The recommendation is to use pl.reduce instead, although the naive way generates a single row list[i64] dataframe (despite the docs claiming it works horizontally):

>>> df.select(pl.reduce(lambda dt1, dt2: np.busday_count(dt1, dt2, weekmask="1110000"), [pl.col('Day1'), pl.col('Day2')]))
shape: (1, 1)
┌───────────┐
│ Day1      │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [0, 1, 1] │
└───────────┘

Wrapping in a Series solves this:

>>> df.select(pl.reduce(lambda dt1, dt2: pl.Series(np.busday_count(dt1, dt2, weekmask="1110000")), [pl.col('Day1'), pl.col('Day2')]))
shape: (3, 1)
┌──────┐
│ Day1 │
│ ---  │
│ i64  │
╞══════╡
│ 0    │
│ 1    │
│ 1    │
└──────┘

zundertj avatar Apr 01 '23 07:04 zundertj

@zundertj Thanks for the suggestion! That's a good workaround until there is native support for business date operations.

wangkev avatar Apr 01 '23 11:04 wangkev

I would also expect some support with aliases

In Expr.dt.offset_by we could then use 1b for 1 business day offset or 1bh for a business hour. Such as in pandas. https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

ion-elgreco avatar Jul 17 '23 18:07 ion-elgreco

Starting with adding '1bd' and '1bh' as offset aliases sounds good! weekmask and holidays can come after

Anyone here interested in giving this one a go?

EDIT: this may require more discussion https://github.com/pola-rs/polars/issues/11568#issuecomment-1750969803

MarcoGorelli avatar Sep 10 '23 11:09 MarcoGorelli

taking this forwards in https://github.com/MarcoGorelli/polars-business , let's take requests and ideas over there

MarcoGorelli avatar Oct 21 '23 18:10 MarcoGorelli

@erinov1 just FYI, all the requests from your example are now available:

import polars as pl
import polars_business as plb
from datetime import date

weekend = ["Sat", "Sun"]
holidays = [date(2000, 1, 1)]

df = pl.DataFrame(
    {
        "start_date": [date(2000, 3, 1), date(2000, 4, 3)],
        "end_date": [date(2000, 3, 3), date(2000, 4, 19)],
    }
)

print(
    df.with_columns(
        start_plus_3bd=plb.col("start_date").bdt.offset_by(
            "3bd", weekend=weekend, holidays=holidays
        ),
        start_is_workday=plb.col("start_date").bdt.is_workday(
            weekend=weekend, holidays=holidays
        ),
        workday_count=plb.workday_count(
            "start_date",
            "end_date",
            weekend=weekend,
            holidays=holidays,
        ),
    )
)
shape: (2, 5)
┌────────────┬────────────┬────────────────┬──────────────────┬───────────────┐
│ start_date ┆ end_date   ┆ start_plus_3bd ┆ start_is_workday ┆ workday_count │
│ ---        ┆ ---        ┆ ---            ┆ ---              ┆ ---           │
│ date       ┆ date       ┆ date           ┆ bool             ┆ i32           │
╞════════════╪════════════╪════════════════╪══════════════════╪═══════════════╡
│ 2000-03-01 ┆ 2000-03-03 ┆ 2000-03-06     ┆ true             ┆ 2             │
│ 2000-04-03 ┆ 2000-04-19 ┆ 2000-04-06     ┆ true             ┆ 12            │
└────────────┴────────────┴────────────────┴──────────────────┴───────────────┘

MarcoGorelli avatar Oct 28 '23 20:10 MarcoGorelli