lifetimes
lifetimes copied to clipboard
utils.calibration_and_holdout_data calculates monetary_value_holdout incorrectly
The logic is averaging the individual line items in the holdout period when calculating monetary_value_holdout column. For the calibration period, monetary value is calculated by summing the monetary value for line items on a given transaction date and then deriving an average. It seems the formula for the holdout period should implement a similar calculation.
Just to provide a bit more clarity. Here is the code block where I believe the issue to resides. I think we should be aggregating on customer id and transaction date to get the expected result:
holdout_transactions[datetime_col] = holdout_transactions[datetime_col].map(to_period)
holdout_summary_data = (
holdout_transactions.groupby([customer_id_col, datetime_col], sort=False)
.agg(lambda r: 1)
.groupby(level=customer_id_col)
.agg(["count"])
)
holdout_summary_data.columns = ["frequency_holdout"]
if monetary_value_col:
holdout_summary_data["monetary_value_holdout"] = holdout_transactions.groupby(customer_id_col)[
monetary_value_col
].mean()