lifetimes icon indicating copy to clipboard operation
lifetimes copied to clipboard

SQL statement to transform transactional data into RFM data

Open Sensei-akin opened this issue 2 years ago • 1 comments

I wanted to bring to your attention a potential discrepancy in the documentation regarding the calculation of Monetary Value using SQL statements in the lifetimes package.

According to the documentation, Monetary Value is defined as "the average value of a given customer’s purchases, equal to the sum of all a customer’s purchases divided by the total number of purchases. Note that the denominator here is different than the frequency described above." However, it appears that the SQL Query provided in the documentation performs the calculation in the opposite manner.

Additionally, my understanding of the lifetimes package is that predicting customers' future behaviour is best achieved by considering their past repeat transactions, which implies excluding their first transaction. Unfortunately, this assumption is not addressed by the query in the documentation when calculating monetary value.

To contribute to the community and address this issue, I am planning to create a Pull Request and upload a notebook that demonstrates how using the query and the lifetimes get transactional data method yields different values for Monetary Value.

I believe clarifying this aspect in the documentation will greatly benefit users seeking to leverage the lifetimes package for their analysis

Sensei-akin avatar Sep 18 '23 08:09 Sensei-akin

Hey @Sensei-akin,

Development on lifetimes ceased some time ago. Here's the link to the successor library:

https://github.com/pymc-labs/pymc-marketing

Would this be the correct query?

SELECT
  customer_id,
  COUNT(DISTINCT DATE(transaction_at)) - 1 as frequency,
  datediff('day', MIN(transaction_at), MAX(transaction_at)) as recency,
  CASE                                              -- MONETARY VALUE CALCULATION
      WHEN COUNT(DISTINCT transaction_at) = 1 THEN 0    -- 0 if only one order
      ELSE
        SUM(
          CASE WHEN first_transaction = transaction_at THEN 0  -- daily average of all but first order
          ELSE salesamount
          END
          ) / (COUNT(DISTINCT transaction_at) - 1)
      END as monetary_value  
  datediff('day', CURRENT_DATE, MIN(transaction_at)) as T
FROM orders
GROUP BY customer_id

If you want to create a PR in pymc-marketing for this, that'd be great. If you're familiar with dbt, I've also been considering building a companion dbt package for this sort of thing.

ColtAllen avatar Sep 18 '23 14:09 ColtAllen