lifetimes icon indicating copy to clipboard operation
lifetimes copied to clipboard

Questions about the Gamma distribution parameters

Open shaddyab opened this issue 5 years ago • 6 comments

I need help linking the parameter from the various models to the appropriate distribution parameters as I was unable to locate all the answers in the documentations

  1. r and alpha from the BetaGeoFitter and ModifiedBetaGeoFitter functions

    1. r is the shape parameter of the Gamma distribution
    2. alpha is the rate or 1/scale parameter of the Gamma distribution Is that correct?
  2. p, q and v from the GammaGammaFitter function

    1. p is the shape parameter of the Gamma distribution. The model assumes that this parameter is the same for all customers.
    2. v is the rate parameter of the Gamma distribution. This parameter varies across customers and has a prior that is also gamma distributed with parameters (q, gamma)
    3. Gama(p, v) defines the distribution of a customer’s observed average transaction value
    4. The expected value of this distribution is p/v ( Shape parameter / Rate parameter]
    5. q is the shape parameter of the 2nd Gamma distribution Is that correct?
  3. Why is the 'gamma' parameter for the 2nd Gamma distribution (the rate parameter) is not provided as an output? The reason I am asking is that based on the “The Gamma-Gamma Model of Monetary Value” paper by Peter Fader the population mean is Eq. 3 is

    gamma* p / (q-1) where gamma is the rate parameter of the 2nd Gamma distribution However, in the fit methods of GammaGammaFitter the population mean is the

    v * p / (q - 1)

    where v is the rate parameter of the 1st distribution. Which equation is the correct one?

  4. I noticed that when calculating the probability that a customer with history (frequency, recency, T) is currently alive using the conditional_probability_alive for the ModifiedBetaGeoFitter there is always a large spike at p=0.5 in the histogram of the estimated probabilities? Why is that?

  5. For the conditional_expected_number_of_purchases_up_to_time and expected_number_of_purchases_up_to_time methods, does the input t represent the time from the first transaction, or a forecasted time ( i.e., Current time + t)?

Thank you!

shaddyab avatar Sep 20 '19 19:09 shaddyab

Hi,

Not an expert but from my limited experience:

  1. p, q and v from the GammaGammaFitter function p is the shape parameter of the Gamma distribution. The model assumes that this parameter is the same for all customers. v is the rate parameter of the Gamma distribution. This parameter varies across customers and has a prior that is also gamma distributed with parameters (q, gamma) Gama(p, v) defines the distribution of a customer’s observed average transaction value The expected value of this distribution is p/v ( Shape parameter / Rate parameter] q is the shape parameter of the 2nd Gamma distribution Is that correct?

The symbols used are a little confusing - I think that 'v' actually represents 'γ' from the second gamma distribution, not 'ν' from the first gamma distribution.

  1. Why is the 'gamma' parameter for the 2nd Gamma distribution (the rate parameter) is not provided as an output?

It is - see above.

  1. I noticed that when calculating the probability that a customer with history (frequency, recency, T) is currently alive using the conditional_probability_alive for the ModifiedBetaGeoFitter there is always a large spike at p=0.5 in the histogram of the estimated probabilities? Why is that?

From my dataset the distribution of P(Alive) for customers with frequency=0 is very different from frequency>0. Could the spike be related to one-time buyers? image

  1. For the conditional_expected_number_of_purchases_up_to_time and expected_number_of_purchases_up_to_time methods, does the input t represent the time from the first transaction, or a forecasted time ( i.e., Current time + t)?

I believe that t represents the forecast period, i.e. for the period t following the end of your observation period, your customers are expected to make so many purchases.

Hope that helps,

Duncan

dmanhattan avatar Sep 26 '19 10:09 dmanhattan

Thank you for taking the time to response. Your answer regarding the P(Alive) makes sense and I was able to reproduce it.

The Gamma-Gamma model should have 4 output parameters (2 parameters per each Gamma distribution). The GammaGammaFitter function outputs only 3 parameters (p,q, and v); therefore I am asking regarding the 4th output It looks like we are agreeing that the p and q values are the shape parameters for two different Gamma distribution, what I am still not clear on is which Gamma distribution does the v output parameter from the GammaGammaFitter function corresponds with. Is it

  1. the Rate parameters corresponding with the p shape parameter (1st Gamma distribution) ? or
  2. the Rate parameter corresponding with the q shape parameters (2nd Gamma distribution which defines the distribution of a customer’s observed average transaction value)?

If the expected value is p/v then this v is the rate parameter corresponding with the p shape parameter and not q (See Page 2, i) of The “Gamma-Gamma Model of Monetary Value” paper by Peter Fader) However, in Eq. 3 of the same paper, the mean is defined as gamma* p / (q-1) while in the fit methods of GammaGammaFitter the population mean was implemented as v * p / (q - 1). In this case, v is the rate parameter corresponding with the q shape parameter and not p. Am I missing something?

shaddyab avatar Sep 26 '19 15:09 shaddyab

As stated in 2) ii of the paper:

ν ~ gamma(q, γ)

which as I interpret* it implies that

zi ~ gamma(p, gamma(q, γ))

I.e. The output of the second gamma distribution is the scale parameter for the first gamma distribution.

If it's useful, I've posted some code here showing how I plotted the Gamma-Gamma distribution for my dataset (copying the example here).

Cheers,

Duncan

*Disclaimer: not a statistician

dmanhattan avatar Sep 26 '19 23:09 dmanhattan

@dmanhattan Thank you for taking the time to share your script. I will review it and let you know if I have any further questions.

shaddyab avatar Sep 27 '19 14:09 shaddyab

@shaddyab, I am referring to the 1st question in your 1st comment:

"r and alpha from the BetaGeoFitter and ModifiedBetaGeoFitter functions

r is the shape parameter of the Gamma distribution alpha is the rate or 1/scale parameter of the Gamma distribution Is that correct?"

I have the same doubt, were you able to clarify this? I did not find a response to this question in the thread. Thank you

utkarshsingh710 avatar Jun 24 '20 18:06 utkarshsingh710

This is right. You can plot scipy.stats.gamma.rvs(r, scale=1/alpha, size = 1000) as your distribution (heterogeneity) of lambda, and similarly scipy.stats.beta.rvs(a, b, size=1000) for the heterogeneity of p.

jeffreymei avatar Aug 09 '20 00:08 jeffreymei