lifetimes icon indicating copy to clipboard operation
lifetimes copied to clipboard

issue about the meanings of monetary value?

Open mengxi-ream opened this issue 5 years ago • 2 comments

Hello,

It seems that I found the inconsistence meanings when I use calibration_and_holdout_data function.

My transaction data have 3 fields, customer_id, finish_time, monetary value.

Then I use calibration_and_holdout_data function

from lifetimes.utils import calibration_and_holdout_data
summary_cal_holdout = calibration_and_holdout_data(
    transaction_data, 
    'passenger_id', 
    'finish_time',
    calibration_period_end='2019-07-01',
    observation_period_end='2019-08-01',
    monetary_value_col='monetary_value'
)
summary_cal_holdout.head()

and I get: image

I found that the meanings of monetary_value_cal and monetary_value_holdout are different. The former is the average value for days (frequency='D') and the latter is the average value for orders.

The detail is showed below: image

As we can see, monetary_value_cal for id 1 is 29.375 (calculated as sum(money)/sum(distinct day)), monetary_value_holdout for id 2 is 7.7125 (calculated as sum(money)/sum(order))

Why they are different? I really confused about it.

mengxi-ream avatar Sep 23 '19 09:09 mengxi-ream

Your post is a bit confusing. From what I understand, you are referring to the difference between monetary_value_cal and monetary_value_holdout?

If that's the case, they should be different, because one calculates the monetary value with respect to the calibration (training) period and the other the holdout (testing) period.

If you're pointing to another issue, please try referring to something more specific inside the calibration_and_holdout_data function.

At any rate, please avoid posting code screenshots, it is annoying for others who try to help you solve your problem.

psygo avatar Sep 23 '19 14:09 psygo

Your post is a bit confusing. From what I understand, you are referring to the difference between monetary_value_cal and monetary_value_holdout?

If that's the case, they should be different, because one calculates the monetary value with respect to the calibration (training) period and the other the holdout (testing) period.

If you're pointing to another issue, please try referring to something more specific inside the calibration_and_holdout_data function.

At any rate, please avoid posting code screenshots, it is annoying for others who try to help you solve your problem.

I mean another issue, the different meaning I mentioned is not about different period, it's about groupby.

In calibration_adn_holdout_data function, The following code is to calculate monetary_value_holdout and it just groupby customer_id_col

 if monetary_value_col:
        holdout_summary_data["monetary_value_holdout"] = holdout_transactions.groupby(customer_id_col)[
            monetary_value_col
        ].mean()

However, in the code to calculate monetary_value_cal, it groupby customer_id_col and date_time_col

calibration_summary_data = summary_data_from_transaction_data(
        calibration_transactions,
        customer_id_col,
        datetime_col,
        datetime_format=datetime_format,
        observation_period_end=calibration_period_end,
        freq=freq,
        monetary_value_col=monetary_value_col,
    )

In summary_data_from_transaction_data function, there is the a function _find_first_transaction and it groupby customer_id_col and date_time_col

period_groupby = transactions.groupby([datetime_col, customer_id_col], sort=False, as_index=False)

if monetary_value_col:
        # when we have a monetary column, make sure to sum together any values in the same period
        period_transactions = period_groupby.sum()

Back to my data example as I mentioned, monetary_value_cal is the average value for one day for a customer (because groupby customer id and datetime and use sum, then use mean). monetary_value_holdout is the average value for one order (because just groupby customer id and use mean).

Therefore, if a customer has multiple orders in one day, the calculation method of monetary_value_cal and monetary_value_holdout is inconsistent. You can use some simple data to see that this is true.

mengxi-ream avatar Sep 24 '19 03:09 mengxi-ream