pymc
pymc copied to clipboard
Refactor pretty-printing and fix latex underscores
What is this PR about?
printing.py
had developed a decent amount of duplicated code. I wanted to work on #6508 but found it difficult to parse through the logic. Through the course of understanding how the submodule works, I ended up refactoring it somewhat.
This PR fixes the LaTeX underscore bug as discussed in that issue and also seeks to clean up pretty-printing. In particular,
- Addresses #6508 by escaping \text commands to print underscores
- Unifies
str_for_model_var
andstr_for_potential_or_deterministic
- Deprecates the unused
include_params
argument- The codebase seems to not use this argument at all, and removing it simplifies the logic of some functions in
printing.py
as well as in tests. But I could re-incorporate it if preferred.
- The codebase seems to not use this argument at all, and removing it simplifies the logic of some functions in
- Parses the value of the
formatting
arg a bit more strictly - Changes the printing of constants to include shape information, e.g.
<constant (1,2)>
- Updates tests to reflect changes
- Adds a handful of other tests for printing functionality: model names, names with edge case characters (underscore, $, ~), and misc tests for the warning and errors that may be thrown by pretty printing
Checklist
- [x] Explain important implementation details π
- [x] Make sure that the pre-commit linting/style checks pass.
- [x] Link relevant issues (preferably in nice commit messages)
- [x] Are the changes covered by tests and docstrings?
- [x] Fill out the short summary sections π
Major / Breaking Changes
- Passing
include_params
to the str_repr function now raises a futurewarning. - The
formatting
argument is now expected to be exactly either "plain" or "latex".
New features
- Constant shape information as discussed. (Sorry, issue creep. It's a one-liner now. But I can also leave this for a separate issue if preferred.)
- Tildes (~) in user-specified strings are converted to dashes (-). Practically this is because
str_for_model
uses ~ and\sim
to align the model str, and it's annoying to typeset a plain tilde character in LaTeX.
Bugfixes
- #6508
- Also fixed a bug (edge case) where AttributeError was thrown when a Potential received some Variable that didn't have the
.owner
attribute. (I'm not sure whether this would ever happen in the wild, but model-building allows for this case, so I went ahead and included it.)
Documentation
- ...
Maintenance
- ~~While working on this I noticed that the
_repr_latex_()
methods assigned to Distributions and Models are essentially the same as theirstr_repr
functions, just with latex enforced. Nowhere in the pymc codebase uses these methods. Candidate for removal?~~ I've realized that this method is necessary for latex to automagically appear when supported by ipython. - ~~The inputs to Potentials and Deterministics have been printed as a function-like expression, e.g.
Deterministic(f(arg1, ...))
. In the updated submodule, we maintain this behavior, but it would be easy to drop thef()
wrapper for this case if wanted.~~ On reflection it seems pretty sensible to keep the f(). - ~~I also noticed that we don't have any tests for printing the model name.~~ Added some tests involving this. This includes the case for nested models (one pm.Model() created within the context of another). I've never done this myself in pymc so could use a sanity check on if we need/want to be testing this.
Forgot to mention: As a qualitative test, I also checked that the "monolithic" model in test_printing.py renders correctly in both MathJax and KaTeX.
Codecov Report
Merging #6533 (8370d97) into main (c2ce47f) will decrease coverage by
9.51%
. The diff coverage is87.23%
.
:exclamation: Current head 8370d97 differs from pull request most recent head f5af848. Consider uploading reports for the commit f5af848 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## main #6533 +/- ##
==========================================
- Coverage 94.74% 85.24% -9.51%
==========================================
Files 147 147
Lines 27861 27855 -6
==========================================
- Hits 26397 23745 -2652
- Misses 1464 4110 +2646
Impacted Files | Coverage Ξ | |
---|---|---|
pymc/printing.py | 87.82% <86.36%> (-0.08%) |
:arrow_down: |
pymc/distributions/distribution.py | 96.31% <100.00%> (ΓΈ) |
|
pymc/model.py | 89.90% <100.00%> (-0.24%) |
:arrow_down: |
pymc/tests/logprob/test_utils.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
pymc/tests/logprob/test_cumsum.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
pymc/tests/logprob/test_mixture.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
pymc/tests/logprob/test_abstract.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
pymc/tests/logprob/test_rewriting.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
pymc/tests/logprob/test_joint_logprob.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
pymc/tests/logprob/test_composite_logprob.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
... and 18 more |
I pushed some changes. Mainly, they clarify the exceptions that are thrown, add a number of printing tests, and fix an edge-case bug or two. I've also updated the original post to reflect the latest. I won't be changing this PR any more till review. (though there are a few questions or comments that I will add to the code.)
Hi @covertg, sorry for the delay in writing back and thanks for your PR :) I'm currently travelling, but I intend to review your PR within the next few days.
@larryshamalama of course, thank you for the review! I understand this PR ballooned a good deal and also recognize that I don't have the full big-picture vision that y'all have. My bad for missing your previous work on the dispatch-based approach.
So thanks for the comments. Did my best to follow up on the specific ones that y'all have left. Moving forward -- I'd be very glad to help work on this in whatever way would best serve pymc in the long run. It'd be fun to work on that together! If I may ask, what led you to decide against the work in #6447?
Also a big-picture question for us. Do we want model graphing to remain separate from model printing? This came to me when considering the question (in comments above) of, do we need to pretty-print pymc variables without a model on the context stack. If the focus is on pretty-printing a full model, it seems like graph-visualizing and model-pretty-printing could possibly be unified. But perhaps that is its own big can of worms.
Also a big-picture question for us. Do we want model _graph_ing to remain separate from model _print_ing? This came to me when considering the question (in comments above) of, do we need to pretty-print pymc variables without a model on the context stack. If the focus is on pretty-printing a full model, it seems like graph-visualizing and model-pretty-printing could possibly be unified. But perhaps that is its own big can of worms.
The big value of the pretty-printing is on printing the whole Model, not individual variables, so I would say that we don't need try to hard to pretty-print individual variables.
@covertg Any interest in picking up on this again? I did a small fix on https://github.com/pymc-devs/pymc/pull/6942 but the underscore thing is still broken.