lifetimes
lifetimes copied to clipboard
PyData talk
This PyData talk might be of interest:
Implementing and Training Predictive Customer Lifetime Value Models in Python by Jean-Rene Gauthier, Ben Van Dyke. https://www.youtube.com/watch?v=gx6oHqpRgpY&list=PLGVZCDnMOq0rxoq9Nx0B4tqtr891vaCn7&index=45
and the accompanying notebook: https://github.com/datascienceinc/pydata-seattle-2017/blob/master/lifetime-value/pareto-nbd.ipynb
- they use a couple lifetimes routines to do some data preparation on the CDNOW dataset
- but the main topic of the talk/notebook is an implementation of Pareto-NBD as a hierarchal model using the pymc3 library
- I've seen some other MCMC implementations of Pareto-NBD (https://github.com/mplatzer/BTYDplus/blob/master/R/pareto-nbd-mcmc.R, which follows from http://ieeexplore.ieee.org/document/4344404/); but at least to me, the pymc3 model interface makes the above implementation particularly concise
Just curious if anyone has tried this approach. We're actually talking about future steps and wonder if it's worth the investment in time to learn and build out this approach on our cluster.
Good stuff. I was trying to find pymc3 implementation without success. :)
Just skimming over this and I'm really confused about the different definitions of recency
. I asked in their github as well but maybe you have an idea on this here, too since this seems ambiguous in the lifetimes docs as well.
In the linked notebook, they define
- recency: time of most recent purchase
However, they use lifetimes.utils.summary_data_from_transaction_data() for the RFM data prep and I find
- in the code
customers['recency'] = (customers['max'] - customers['min'])
- in the docs: recency represents the age of the customer when they made their most recent purchases. This is equal to the duration between a customer’s first purchase and their latest purchase.
I get further confused by the doc's explanation of the recency/frequency graph where they seem to be using the other definition ("Your coldest customers are those that are in the top-right corner: they bought a lot quickly, and we haven’t seen them in weeks.")
Maybe someone can shed some light on these two definitions? Am I missing something obvious?
@ReaBx, no, you are mostly right, it is a confusing concept and I get confused sometimes as well. In lifetimes, the definition is:
customers['recency'] = (customers['max'] - customers['min'])
In words:
recency represents the age of the customer when they made their most recent purchases. This is equal to the duration between a customer’s first purchase and their latest purchase.
In the linked notebook, the author uses the definition:
recency: time of most recent purchase
That's too ambiguous of a definition. But ultimately the author uses lifetimes
utils to calculate it, so their summary statistics are the same.
I get further confused by the doc's explanation of the recency/frequency graph where they seem to be using the other definition ("Your coldest customers are those that are in the top-right corner: they bought a lot quickly, and we haven’t seen them in weeks.")
Fair - this graphic does cause a lot of confusion. In generating this graphic, we need to set a max time, which in this case is ~40 time periods. Another way to see this is "all customers first bought from me 40 weeks ago". Thus if a customer has a recency of 5, it means they bought at time 0 and 5 and never since. Thus they are likely dead (or "cold").
Even after installing 'pymc3' using pip install pymc3, which showed me that it was successfully installed.
After running the program using Jupyter notebook, python 3.6, I am getting an error.
ModuleNotFoundError: No module named 'pymc3'
@CamDavidsonPilon , I was going though the paper[1] by Fader 2004, which states RFM as
“RFM” characteristics: recency (time of most recent purchase), frequency (number of past purchases), and monetary value (average purchase amount per transaction).
Where you code as pointed out by @ReaBx takes
Recency(recency represents the age of the customer when they made their most recent purchases) and Frequency as (repeat purchases)
customers['frequency'] = customers['count'] - 1
customers['recency'] = (customers['max'] - customers['min']) / freq_multiplier
I am confused as to how this difference in the two recency and frequency defination will affect the CLV modelling?
[1] Fader, Peter S., Bruce GS Hardie, and Ka Lok Lee. "RFM and CLV: Using iso-value curves for customer base analysis." Journal of marketing research 42.4 (2005): 415-430.(http://brucehardie.com/papers/rfm_clv_2005-02-16.pdf)
Does anyone have a local copy of the notebook?
This has been taken down, does anyone have a copy of the notebook?
It's here.