Documentation flow incorrect?
I am new to lifetimes package and have read tons of tutorial online about this package.
However, in all the tutorials and in our documentation, I see that they fit the model using full dataset (summary data generated from transaction data)
Doesn't it have to be split first into calibration and holdout before we could fit BG/NBD model or any other model?
Is there any reason that in the official docs and online tutorials, everywhere people do fit the model using full dataset first? and later they split into calibration and hold out datasets?
Can someone help me understand? I see someone has touched upon the same topic on another discussion [here] (https://github.com/CamDavidsonPilon/lifetimes/issues/334) where they were just following the documentation
You can find online tutorials below
https://towardsdatascience.com/buy-til-you-die-predict-customer-lifetime-value-in-python-9701bfd4ddc0
https://towardsdatascience.com/modeling-customer-lifetime-value-with-lifetimes-71171a35f654
https://medium.com/@ugursavci/customer-lifetime-value-prediction-in-python-89e4a50df12e
Hey @SSMK-wq,
Use of calibration_and_holdout_data is optional; it's a useful way to test models on external data if your use case calls for it. For example, perhaps your company had a new product launch: You could fit a model to customer data pre-launch, then see if/how behavior has changed post-launch.
calibration_and_holdout_data allows for a supervised-learning approach to model validation, but these models aren't entirely supervised. A Bayesian posterior predictive check would be more appropriate, as mentioned here, but until fairly recently Bayesian p-values could be cumbersome to calculate in Python.