lifetimes icon indicating copy to clipboard operation
lifetimes copied to clipboard

utils.calibration_and_holdout_data calculates monetary_value_holdout incorrectly

Open bryansmith-db opened this issue 4 years ago • 1 comments

The logic is averaging the individual line items in the holdout period when calculating monetary_value_holdout column. For the calibration period, monetary value is calculated by summing the monetary value for line items on a given transaction date and then deriving an average. It seems the formula for the holdout period should implement a similar calculation.

bryansmith-db avatar Apr 25 '20 20:04 bryansmith-db

Just to provide a bit more clarity. Here is the code block where I believe the issue to resides. I think we should be aggregating on customer id and transaction date to get the expected result:

holdout_transactions[datetime_col] = holdout_transactions[datetime_col].map(to_period)
holdout_summary_data = (
    holdout_transactions.groupby([customer_id_col, datetime_col], sort=False)
    .agg(lambda r: 1)
    .groupby(level=customer_id_col)
    .agg(["count"])
)
holdout_summary_data.columns = ["frequency_holdout"]
if monetary_value_col:
    holdout_summary_data["monetary_value_holdout"] = holdout_transactions.groupby(customer_id_col)[
        monetary_value_col
    ].mean()

bryansmith-db avatar Apr 27 '20 13:04 bryansmith-db