CLVTools icon indicating copy to clipboard operation
CLVTools copied to clipboard

BGBB model - Data structures to fit models

Open pschil opened this issue 4 years ago • 2 comments

While implementing the BGBB model, it became clear that the transaction history does not suffice as input for all models. Because the BGBB model is for a discrete-time setting, it requires additional information on the transaction opportunities, potentially for each customer differently.

Providing this functionality through the existing clv.data object which represents the full transaction history blurs the lines of responsibility (=Single Responsibility Principle). It would require all kinds of internal case differentiations in clv.data (ie has transaction opportunities or not?) that hamper maintenance. It would also create a much more challenging user-interface although this functionality is in fact only used for a single model.

Rather one class should do one thing only and do it well. Therefore, to simplify usage and encapsulate distinct functionality into separate objects, I suggest to separate the transaction opportunity functionality from the transaction history:

clv.transactions This is the full transaction history of each customer which allows to add static and dynamic covariate data. This is what clv.data currently is.

clv.transaction.opportunities A separate data structure that contains the transactions opportunities for every customer, potentially a duration in case TOs stretch over a period (ie a TO is a week). In combination with clv.transactions this can be used to fit a discrete-time model.

Usage Fitting a continuous time model remains the same while for discrete time models, it would required an additional input.

clv.trans <- clv.transactions(cdnow, "ymd", "w", 37)
pnbd(clv.trans)

clv.TO <- clv.transaction.opportunities(table)
bgbb(clv.trans, clv.TO)

Another common use case is that end users do not have the full transaction history because it can be huge. Rather users are given a summary of all transactions pulled from some DB (last transaction, number of transaction, mean spending, etc). To support this use case, I suggest to add data structures:

clv.transaction.summary Contains the minimal information per customer to create the model cbs. Notably, this differs from the cbs as that the values given to create it do not imply a time unit already: The recency is not given as a number (ie 34) what rather calculated based on dates to allow for different time units. It allows to add static covariates but not dynamic.

clv.cbs In order to reproduce results from papers such as for the BGBB or for expert users familiar with the models, it provides an additional way to fit a model. It allows to add static covariates but not dynamic. This could replace the current way that the cbs is currently stored internally (ie as simple data.table). Note, that they are specific to one model only (ie required columns).

Usage

trans.summary <- data.table(Id=1, last.trans="2005-03-01", first.trans="2007-08-21", n.trans=8, mean.spending=41)
clv.summary <- clv.transaction.summary(trans.summary, "ymd", "weeks")
pnbd(clv.summary)

cbs.pnbd <- clv.pnbd.cbs(data.table(Id=1, recency=1, frequency=8, mean.spending=41))
pnbd(cbs.pnbd)

@bachmannpatrick @mmeierer @niels89 critique and comments?

pschil avatar Apr 27 '20 20:04 pschil

As discussed with Patrick, it might be more desirable to have entirely distinct classes for continuous- and discrete-time data. Reasons are to sensitize users for the differences and that the plots and summary statistics to produce are inherently different.

pschil avatar Apr 29 '20 20:04 pschil

I see the reasons for having two different classes and agree with your line of argumentation.

mmeierer avatar May 01 '20 21:05 mmeierer