pymc icon indicating copy to clipboard operation
pymc copied to clipboard

Refactor pretty-printing and fix latex underscores

Open covertg opened this issue 2 years ago β€’ 7 comments

What is this PR about? printing.py had developed a decent amount of duplicated code. I wanted to work on #6508 but found it difficult to parse through the logic. Through the course of understanding how the submodule works, I ended up refactoring it somewhat.

This PR fixes the LaTeX underscore bug as discussed in that issue and also seeks to clean up pretty-printing. In particular,

  • Addresses #6508 by escaping \text commands to print underscores
  • Unifies str_for_model_var and str_for_potential_or_deterministic
  • Deprecates the unused include_params argument
    • The codebase seems to not use this argument at all, and removing it simplifies the logic of some functions in printing.py as well as in tests. But I could re-incorporate it if preferred.
  • Parses the value of the formatting arg a bit more strictly
  • Changes the printing of constants to include shape information, e.g. <constant (1,2)>
  • Updates tests to reflect changes
  • Adds a handful of other tests for printing functionality: model names, names with edge case characters (underscore, $, ~), and misc tests for the warning and errors that may be thrown by pretty printing

Checklist

Major / Breaking Changes

  • Passing include_params to the str_repr function now raises a futurewarning.
  • The formatting argument is now expected to be exactly either "plain" or "latex".

New features

  • Constant shape information as discussed. (Sorry, issue creep. It's a one-liner now. But I can also leave this for a separate issue if preferred.)
  • Tildes (~) in user-specified strings are converted to dashes (-). Practically this is because str_for_model uses ~ and \sim to align the model str, and it's annoying to typeset a plain tilde character in LaTeX.

Bugfixes

  • #6508
  • Also fixed a bug (edge case) where AttributeError was thrown when a Potential received some Variable that didn't have the .owner attribute. (I'm not sure whether this would ever happen in the wild, but model-building allows for this case, so I went ahead and included it.)

Documentation

  • ...

Maintenance

  • ~~While working on this I noticed that the _repr_latex_() methods assigned to Distributions and Models are essentially the same as their str_repr functions, just with latex enforced. Nowhere in the pymc codebase uses these methods. Candidate for removal?~~ I've realized that this method is necessary for latex to automagically appear when supported by ipython.
  • ~~The inputs to Potentials and Deterministics have been printed as a function-like expression, e.g. Deterministic(f(arg1, ...)). In the updated submodule, we maintain this behavior, but it would be easy to drop the f() wrapper for this case if wanted.~~ On reflection it seems pretty sensible to keep the f().
  • ~~I also noticed that we don't have any tests for printing the model name.~~ Added some tests involving this. This includes the case for nested models (one pm.Model() created within the context of another). I've never done this myself in pymc so could use a sanity check on if we need/want to be testing this.

covertg avatar Feb 18 '23 19:02 covertg

Forgot to mention: As a qualitative test, I also checked that the "monolithic" model in test_printing.py renders correctly in both MathJax and KaTeX.

covertg avatar Feb 18 '23 19:02 covertg

Codecov Report

Merging #6533 (8370d97) into main (c2ce47f) will decrease coverage by 9.51%. The diff coverage is 87.23%.

:exclamation: Current head 8370d97 differs from pull request most recent head f5af848. Consider uploading reports for the commit f5af848 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6533      +/-   ##
==========================================
- Coverage   94.74%   85.24%   -9.51%     
==========================================
  Files         147      147              
  Lines       27861    27855       -6     
==========================================
- Hits        26397    23745    -2652     
- Misses       1464     4110    +2646     
Impacted Files Coverage Ξ”
pymc/printing.py 87.82% <86.36%> (-0.08%) :arrow_down:
pymc/distributions/distribution.py 96.31% <100.00%> (ΓΈ)
pymc/model.py 89.90% <100.00%> (-0.24%) :arrow_down:
pymc/tests/logprob/test_utils.py 0.00% <0.00%> (-100.00%) :arrow_down:
pymc/tests/logprob/test_cumsum.py 0.00% <0.00%> (-100.00%) :arrow_down:
pymc/tests/logprob/test_mixture.py 0.00% <0.00%> (-100.00%) :arrow_down:
pymc/tests/logprob/test_abstract.py 0.00% <0.00%> (-100.00%) :arrow_down:
pymc/tests/logprob/test_rewriting.py 0.00% <0.00%> (-100.00%) :arrow_down:
pymc/tests/logprob/test_joint_logprob.py 0.00% <0.00%> (-100.00%) :arrow_down:
pymc/tests/logprob/test_composite_logprob.py 0.00% <0.00%> (-100.00%) :arrow_down:
... and 18 more

codecov[bot] avatar Feb 18 '23 19:02 codecov[bot]

I pushed some changes. Mainly, they clarify the exceptions that are thrown, add a number of printing tests, and fix an edge-case bug or two. I've also updated the original post to reflect the latest. I won't be changing this PR any more till review. (though there are a few questions or comments that I will add to the code.)

covertg avatar Feb 20 '23 21:02 covertg

Hi @covertg, sorry for the delay in writing back and thanks for your PR :) I'm currently travelling, but I intend to review your PR within the next few days.

larryshamalama avatar Feb 21 '23 01:02 larryshamalama

@larryshamalama of course, thank you for the review! I understand this PR ballooned a good deal and also recognize that I don't have the full big-picture vision that y'all have. My bad for missing your previous work on the dispatch-based approach.

So thanks for the comments. Did my best to follow up on the specific ones that y'all have left. Moving forward -- I'd be very glad to help work on this in whatever way would best serve pymc in the long run. It'd be fun to work on that together! If I may ask, what led you to decide against the work in #6447?

Also a big-picture question for us. Do we want model graphing to remain separate from model printing? This came to me when considering the question (in comments above) of, do we need to pretty-print pymc variables without a model on the context stack. If the focus is on pretty-printing a full model, it seems like graph-visualizing and model-pretty-printing could possibly be unified. But perhaps that is its own big can of worms.

covertg avatar Mar 07 '23 18:03 covertg

Also a big-picture question for us. Do we want model _graph_ing to remain separate from model _print_ing? This came to me when considering the question (in comments above) of, do we need to pretty-print pymc variables without a model on the context stack. If the focus is on pretty-printing a full model, it seems like graph-visualizing and model-pretty-printing could possibly be unified. But perhaps that is its own big can of worms.

The big value of the pretty-printing is on printing the whole Model, not individual variables, so I would say that we don't need try to hard to pretty-print individual variables.

ricardoV94 avatar Mar 30 '23 10:03 ricardoV94

@covertg Any interest in picking up on this again? I did a small fix on https://github.com/pymc-devs/pymc/pull/6942 but the underscore thing is still broken.

ricardoV94 avatar Mar 21 '24 10:03 ricardoV94