pymc-marketing icon indicating copy to clipboard operation
pymc-marketing copied to clipboard

Add `BetaGeoBetaBinom` Distribution Block

Open ColtAllen opened this issue 1 year ago • 12 comments

This PR resurrects https://github.com/pymc-labs/pymc-marketing/pull/188 to partially address https://github.com/pymc-labs/pymc-marketing/issues/176.

The Beta-Geometic/Beta Binomial model is a CLV model for discrete, non-contractual use cases - a good example would be sporting events. This PR adds a distribution block to be used for posterior predictive checks and the logp in the full model which will be added in a separate PR.

For now this is a draft PR because I can't get the logp to return the correct values; for reference please see (5) on p4 of the research paper:

https://www.brucehardie.com/papers/020/fader_et_al_mksc_10.pdf

I could use some help with the pytensor functions this requires.


:books: Documentation preview :books:: https://pymc-marketing--431.org.readthedocs.build/en/431/

ColtAllen avatar Nov 11 '23 08:11 ColtAllen

I'll try to check this week @ColtAllen

ricardoV94 avatar Nov 11 '23 12:11 ricardoV94

I pushed a version of the logp that works with Scan. I think we can make it work without a Scan, by evaluating on the maximum T-tx range. I just don't know if this results in much wasted computation in general use cases (for instance if there's one datapoint with a very large range, but most other datapoints have smaller ranges)

ricardoV94 avatar Nov 14 '23 18:11 ricardoV94

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.08%. Comparing base (9aa8bc0) to head (60a1f55).

:exclamation: Current head 60a1f55 differs from pull request most recent head c376695

Please upload reports for the commit c376695 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #431      +/-   ##
==========================================
- Coverage   92.25%   92.08%   -0.17%     
==========================================
  Files          24       24              
  Lines        2414     2490      +76     
==========================================
+ Hits         2227     2293      +66     
- Misses        187      197      +10     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Nov 14 '23 18:11 codecov[bot]

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

I added the likelihood expression to the docstrings, along with a dev notebook. Not sure if the LaTeX rendered properly in the docstrings though, because the docs build is failing due to a pytensor issue:

ImportError: cannot import name 'vectorize' from 'pytensor.graph' 
(/home/docs/checkouts/readthedocs.org/user_builds/pymc-marketing/envs/431/lib/python3.10/site-packages/pytensor/graph/__init__.py)

ColtAllen avatar Dec 11 '23 18:12 ColtAllen

I pushed a version of the logp that works with Scan. I think we can make it work without a Scan, by evaluating on the maximum T-tx range. I just don't know if this results in much wasted computation in general use cases (for instance if there's one datapoint with a very large range, but most other datapoints have smaller ranges)

In general I don't see very large values of T being a common use case with this model. Is Scan the most performant approach? If so I'm fine leaving it as-is.

ColtAllen avatar Dec 11 '23 19:12 ColtAllen

I added the likelihood expression to the docstrings, along with a dev notebook. Not sure if the LaTeX rendered properly in the docstrings though, because the docs build is failing due to a pytensor issue:

ImportError: cannot import name 'vectorize' from 'pytensor.graph' 
(/home/docs/checkouts/readthedocs.org/user_builds/pymc-marketing/envs/431/lib/python3.10/site-packages/pytensor/graph/__init__.py)

It's now called vectorize_graph

ricardoV94 avatar Dec 16 '23 16:12 ricardoV94

Is Scan the most performant approach? If so I'm fine leaving it as-is.

Scan is the simplest approach

ricardoV94 avatar Dec 16 '23 16:12 ricardoV94

Tests are still failing with vectorize_graph. Do the library dependencies require updating?

ColtAllen avatar Dec 16 '23 17:12 ColtAllen

Yes you need to bump the oldest supported pymc version to the most recent

ricardoV94 avatar Dec 16 '23 17:12 ricardoV94

In here and in the ci wofkflow file: https://github.com/pymc-labs/pymc-marketing/blob/main/pyproject.toml#L20

ricardoV94 avatar Dec 16 '23 17:12 ricardoV94

I had to increase rtol because we're comparing against the data simulation function in lifetimes, and it lacks a random seed. If the sim_data function looks good and I'm indexing properly in this test, this should be good to go.

ColtAllen avatar Feb 24 '24 18:02 ColtAllen

@ColtAllen this looks good! I wanna merge this one and iterate (I can help with that). I have just two requests :D :

  • Remove commented code
  • Fix pre-commit (I think Ruff is complaining)

Otherwise LGTM.

First point is done, but Ruff is complaining about a LaTeX expression in the docstring that is too long. I'm not sure if there's any way to shorten it that doesn't prevent the docstring from rendering properly.

ColtAllen avatar May 26 '24 23:05 ColtAllen

First point is done, but Ruff is complaining about a LaTeX expression in the docstring that is too long. I'm not sure if there's any way to shorten it that doesn't prevent the docstring from rendering properly.

Doesn't seem to be possible to turn off and on at the moment based on this ruff issue. Maybe you can define the string outside of the class then use f-string. Could reduce 8 something characters on a line. Could turn off for the whole file if there is a docstring specific option

williambdean avatar May 27 '24 00:05 williambdean

I think we can simply do it as in https://github.com/pymc-labs/pymc-marketing/blob/main/pymc_marketing/clv/distributions.py#L407

juanitorduz avatar May 27 '24 08:05 juanitorduz

Thanks @ColtAllen ! Let's now work on the model 🎉 !

juanitorduz avatar May 27 '24 18:05 juanitorduz