pymc-marketing
pymc-marketing copied to clipboard
Add `BetaGeoBetaBinom` Distribution Block
This PR resurrects https://github.com/pymc-labs/pymc-marketing/pull/188 to partially address https://github.com/pymc-labs/pymc-marketing/issues/176.
The Beta-Geometic/Beta Binomial model is a CLV model for discrete, non-contractual use cases - a good example would be sporting events. This PR adds a distribution block to be used for posterior predictive checks and the logp in the full model which will be added in a separate PR.
For now this is a draft PR because I can't get the logp to return the correct values; for reference please see (5) on p4 of the research paper:
https://www.brucehardie.com/papers/020/fader_et_al_mksc_10.pdf
I could use some help with the pytensor functions this requires.
:books: Documentation preview :books:: https://pymc-marketing--431.org.readthedocs.build/en/431/
I'll try to check this week @ColtAllen
I pushed a version of the logp that works with Scan. I think we can make it work without a Scan, by evaluating on the maximum T-tx range. I just don't know if this results in much wasted computation in general use cases (for instance if there's one datapoint with a very large range, but most other datapoints have smaller ranges)
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.08%. Comparing base (
9aa8bc0) to head (60a1f55).
:exclamation: Current head 60a1f55 differs from pull request most recent head c376695
Please upload reports for the commit c376695 to get more accurate results.
Additional details and impacted files
@@ Coverage Diff @@
## main #431 +/- ##
==========================================
- Coverage 92.25% 92.08% -0.17%
==========================================
Files 24 24
Lines 2414 2490 +76
==========================================
+ Hits 2227 2293 +66
- Misses 187 197 +10
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
I added the likelihood expression to the docstrings, along with a dev notebook. Not sure if the LaTeX rendered properly in the docstrings though, because the docs build is failing due to a pytensor issue:
ImportError: cannot import name 'vectorize' from 'pytensor.graph'
(/home/docs/checkouts/readthedocs.org/user_builds/pymc-marketing/envs/431/lib/python3.10/site-packages/pytensor/graph/__init__.py)
I pushed a version of the logp that works with Scan. I think we can make it work without a Scan, by evaluating on the maximum T-tx range. I just don't know if this results in much wasted computation in general use cases (for instance if there's one datapoint with a very large range, but most other datapoints have smaller ranges)
In general I don't see very large values of T being a common use case with this model. Is Scan the most performant approach? If so I'm fine leaving it as-is.
I added the likelihood expression to the docstrings, along with a dev notebook. Not sure if the LaTeX rendered properly in the docstrings though, because the docs build is failing due to a
pytensorissue:ImportError: cannot import name 'vectorize' from 'pytensor.graph' (/home/docs/checkouts/readthedocs.org/user_builds/pymc-marketing/envs/431/lib/python3.10/site-packages/pytensor/graph/__init__.py)
It's now called vectorize_graph
Is Scan the most performant approach? If so I'm fine leaving it as-is.
Scan is the simplest approach
Tests are still failing with vectorize_graph. Do the library dependencies require updating?
Yes you need to bump the oldest supported pymc version to the most recent
In here and in the ci wofkflow file: https://github.com/pymc-labs/pymc-marketing/blob/main/pyproject.toml#L20
I had to increase rtol because we're comparing against the data simulation function in lifetimes, and it lacks a random seed. If the sim_data function looks good and I'm indexing properly in this test, this should be good to go.
@ColtAllen this looks good! I wanna merge this one and iterate (I can help with that). I have just two requests :D :
- Remove commented code
- Fix pre-commit (I think Ruff is complaining)
Otherwise LGTM.
First point is done, but Ruff is complaining about a LaTeX expression in the docstring that is too long. I'm not sure if there's any way to shorten it that doesn't prevent the docstring from rendering properly.
First point is done, but Ruff is complaining about a LaTeX expression in the docstring that is too long. I'm not sure if there's any way to shorten it that doesn't prevent the docstring from rendering properly.
Doesn't seem to be possible to turn off and on at the moment based on this ruff issue. Maybe you can define the string outside of the class then use f-string. Could reduce 8 something characters on a line. Could turn off for the whole file if there is a docstring specific option
I think we can simply do it as in https://github.com/pymc-labs/pymc-marketing/blob/main/pymc_marketing/clv/distributions.py#L407
Thanks @ColtAllen ! Let's now work on the model 🎉 !