lifetimes icon indicating copy to clipboard operation
lifetimes copied to clipboard

Documentation flow incorrect?

Open SSMK-wq opened this issue 3 years ago • 1 comments

I am new to lifetimes package and have read tons of tutorial online about this package.

However, in all the tutorials and in our documentation, I see that they fit the model using full dataset (summary data generated from transaction data)

Doesn't it have to be split first into calibration and holdout before we could fit BG/NBD model or any other model?

Is there any reason that in the official docs and online tutorials, everywhere people do fit the model using full dataset first? and later they split into calibration and hold out datasets?

Can someone help me understand? I see someone has touched upon the same topic on another discussion [here] (https://github.com/CamDavidsonPilon/lifetimes/issues/334) where they were just following the documentation

You can find online tutorials below

https://towardsdatascience.com/buy-til-you-die-predict-customer-lifetime-value-in-python-9701bfd4ddc0

https://towardsdatascience.com/modeling-customer-lifetime-value-with-lifetimes-71171a35f654

https://medium.com/@ugursavci/customer-lifetime-value-prediction-in-python-89e4a50df12e

SSMK-wq avatar Nov 03 '22 07:11 SSMK-wq

Hey @SSMK-wq,

Use of calibration_and_holdout_data is optional; it's a useful way to test models on external data if your use case calls for it. For example, perhaps your company had a new product launch: You could fit a model to customer data pre-launch, then see if/how behavior has changed post-launch.

calibration_and_holdout_data allows for a supervised-learning approach to model validation, but these models aren't entirely supervised. A Bayesian posterior predictive check would be more appropriate, as mentioned here, but until fairly recently Bayesian p-values could be cumbersome to calculate in Python.

ColtAllen avatar Nov 03 '22 22:11 ColtAllen