lifetimes
lifetimes copied to clipboard
CLV lower than monetary_value
Hi,
I'm using customer_lifetime_value
to calculate CLV and I get RMSE of ~0.4 (which I think is good). The problem is that I found users that have CLV lower than monetary_value
. I have 2 questions:
- Is CLV total revenue or average (my guess is total as it fits real life results better).
- If #1 is true then how is it possible that I get CLV < monetary value.
*** I put monetary_value
to be average revenue (total_revenue / days_in_data
).
Thanks!
In this library, (prior) monetary_value
is the average of total previous revenue divided by the number of purchases the customer has made.
And CLV is the total amount the customer is going to spend after a set date. CLV is not the sum of previous monetary value with the future, predicted one. There is no mandatory condition in reality that would make someone necessarily spend more money in a store than they already have.
Thanks @psygo
So I got everything wrong ;)
A couple of followup questions:
- If I use data for 2 months and predict 2 months with discount=0, should I expect total revenue to be the same?
- What are the best suggested ways to test the results?
- What parameters can I change to improve results? (If any)
Thanks!
@refaelos, with respect to your questions:
- Nope. I don't know what you're thinking about the discount variable but it's simply some kind of inflation or interests compensator, nothing much. Training your model on the first 2 months will make it recognize
monetary_value
. Predicting (testing) it on the rest of the data will make it calculate CLV for the same clients for the next, say, 2 months.monetary_value
and CLV can yield totally different numbers, their sum can is within [monetary_value
, +infinity], depending on what the model learns. - There are tons of graphs that help in the visual validation, which are also demonstrated in the main tutorial of the library. As you mentioned, RMSE is also a good metric, which was unfortunately not implemented.
- The simpleste parameters that come to my mind are: the
penalizer_coef
of the model (don't go too high, keep it under1
for example); the date in which the separation between training and testing occurs.
In this library, (prior)
monetary_value
is the average of total previous revenue divided by the number of purchases the customer has made.And CLV is the total amount the customer is going to spend after a set date. CLV is not the sum of previous monetary value with the future, predicted one. There is no mandatory condition in reality that would make someone necessarily spend more money in a store than they already have.
When calculating the monetary_value
using summary_data_from_transaction_data
, it seems like the first transaction is left out.
https://github.com/CamDavidsonPilon/lifetimes/blob/master/lifetimes/utils.py#L296
If that's the case, why is that so?
In this library, (prior)
monetary_value
is the average of total previous revenue divided by the number of purchases the customer has made. And CLV is the total amount the customer is going to spend after a set date. CLV is not the sum of previous monetary value with the future, predicted one. There is no mandatory condition in reality that would make someone necessarily spend more money in a store than they already have.When calculating the
monetary_value
usingsummary_data_from_transaction_data
, it seems like the first transaction is left out. https://github.com/CamDavidsonPilon/lifetimes/blob/master/lifetimes/utils.py#L296 If that's the case, why is that so?
Never mind. I just realised that the model assumes that the value of the first transaction is 0.