SQL statement to transform transactional data into RFM data
I wanted to bring to your attention a potential discrepancy in the documentation regarding the calculation of Monetary Value using SQL statements in the lifetimes package.
According to the documentation, Monetary Value is defined as "the average value of a given customer’s purchases, equal to the sum of all a customer’s purchases divided by the total number of purchases. Note that the denominator here is different than the frequency described above." However, it appears that the SQL Query provided in the documentation performs the calculation in the opposite manner.
Additionally, my understanding of the lifetimes package is that predicting customers' future behaviour is best achieved by considering their past repeat transactions, which implies excluding their first transaction. Unfortunately, this assumption is not addressed by the query in the documentation when calculating monetary value.
To contribute to the community and address this issue, I am planning to create a Pull Request and upload a notebook that demonstrates how using the query and the lifetimes get transactional data method yields different values for Monetary Value.
I believe clarifying this aspect in the documentation will greatly benefit users seeking to leverage the lifetimes package for their analysis
Hey @Sensei-akin,
Development on lifetimes ceased some time ago. Here's the link to the successor library:
https://github.com/pymc-labs/pymc-marketing
Would this be the correct query?
SELECT
customer_id,
COUNT(DISTINCT DATE(transaction_at)) - 1 as frequency,
datediff('day', MIN(transaction_at), MAX(transaction_at)) as recency,
CASE -- MONETARY VALUE CALCULATION
WHEN COUNT(DISTINCT transaction_at) = 1 THEN 0 -- 0 if only one order
ELSE
SUM(
CASE WHEN first_transaction = transaction_at THEN 0 -- daily average of all but first order
ELSE salesamount
END
) / (COUNT(DISTINCT transaction_at) - 1)
END as monetary_value
datediff('day', CURRENT_DATE, MIN(transaction_at)) as T
FROM orders
GROUP BY customer_id
If you want to create a PR in pymc-marketing for this, that'd be great. If you're familiar with dbt, I've also been considering building a companion dbt package for this sort of thing.