pandas icon indicating copy to clipboard operation
pandas copied to clipboard

REF: Use `Styler` implementation for `DataFrame.to_latex`

Open attack68 opened this issue 3 years ago • 1 comments

After a year of patching up things in #41649, and @jreback merge of #47864 I can finally propose this for review.

Objective

  • Maintain DataFrame.to_latex with its existing arguments, adding no arguments
  • Process the rendering via Styler.to_latex
  • Eliminate the need for LatexFormatter (code removal not part of this PR) and dual pandas code systems.
  • Redocument and direct users to the Styler implementation for forward development

Outcome

  • All arguments in DataFrame.to_latex were replicable, with the exception of col_space which has no impact upon latex render and, I personally don't like anyway. col_space is deprecated with warning and test.
  • All original tests pass with minor changes to latex formatting and no significant changes to latex render.
  • Some default formatting of floats changes based on pandas Styler options and DataFrame options crossover, which should be addressed later.
  • The performance of Styler is marginally better, although for the table sizes that one would like to render in latex is neglible, anyway.

No whats_new: awaiting feedback.

New docs

The key new section of the docs..

Screenshot 2022-08-04 at 22 19 37

attack68 avatar Aug 04 '22 20:08 attack68

@ivanovmg @rhshadrach you both had input to the underlying issue so I think your opinion on the output here is very welcome.

attack68 avatar Aug 08 '22 20:08 attack68

I have concerns on merging this prior to 2.0 though.

Can you share those. Im abivalent as towards 1.5.0 and 2.0, but here are some good reasons for 1.5.0.

  • It allows getting some feedback issues ahead of 2.0 for these transition methods. Since to_html is also planned for the same transition, but I imagine that is a much more popular method, this feedback might be useful.
  • This implementation doesn't change the current arguments for DataFrame.to_latex so provides minimal breaking change and a smoother transition. between 1.4. to 1.5 to 2.

Arguments for including in 2.0

  • Breaking issues do not matter as much.
  • The keyword arguments to DataFrame.to_latex can be changed (rather simplified) and make this function much simpler in code. (Importantly they do not have to conform to the current arguments)

attack68 avatar Aug 10 '22 07:08 attack68

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

github-actions[bot] avatar Sep 22 '22 00:09 github-actions[bot]

Apologies @attack68 on not getting back to you here.

I have concerns on merging this prior to 2.0 though.

Can you share those. Im abivalent as towards 1.5.0 and 2.0, but here are some good reasons for 1.5.0.

The concerns are the breaking changes this introduces, highlighted in my comments above. I think it would be okay to have the added requirement of jinja2 in 2.0 (https://github.com/pandas-dev/pandas/pull/47970#discussion_r941803622). https://github.com/pandas-dev/pandas/pull/47970#discussion_r943942827 is still outstanding.

rhshadrach avatar Sep 24 '22 14:09 rhshadrach

Apologies @attack68 on not getting back to you here.

I have concerns on merging this prior to 2.0 though.

Can you share those. Im abivalent as towards 1.5.0 and 2.0, but here are some good reasons for 1.5.0.

The concerns are the breaking changes this introduces, highlighted in my comments above. I think it would be okay to have the added requirement of jinja2 in 2.0 (#47970 (comment)). #47970 (comment) is still outstanding.

@rhshadrach thanks for reviewing but I think I will close this for now. I'm not sure I will have the time to push this through within the next couple of months. It has also been very difficult to gather any form of consensus for transitioning the DataFrame.to_xxx methods to use the styler implementation in this pandas version, whereas I think 2.0 offers the chance to be more flexible.

attack68 avatar Sep 25 '22 20:09 attack68

@attack68 thanks for the attempt. It is a shame there are great styles in HTML and not in LaTeX image image

mdengler avatar Nov 14 '22 13:11 mdengler

@mdengler Styler.to_latex can reproduce that image, (maybe not the dashed borders very easily). Take a look at the example on the docs page https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.formats.style.Styler.to_latex.html

attack68 avatar Nov 14 '22 13:11 attack68

@mroeschke I have revived and updated this for 2.0.

attack68 avatar Nov 18 '22 22:11 attack68

(As someone not very familiar with to_latex & Styler), I just want to clarify the following

  1. What will be API breaking implication of using the Styler implementation in DataFrame.to_latex. From what I can tell so far, it's
  • jinja2 needing to be installed
  • _repr_latex_ renders differently(?)

IMO I think the above should be noted in the whatsnew

  1. Sounds like generally people should be using Styler.to_latex over DataFrame.to_latex. If the Styler implementation contains all the functionality of the DataFrame implementation, are there any blocking factors of just deprecation DataFrame.to_latex outright and remove in 3.0?

mroeschke avatar Nov 21 '22 22:11 mroeschke

I am also quite unfamiliar with the styler.

2. If the Styler implementation contains all the functionality of the DataFrame implementation, are there any blocking factors of just deprecation DataFrame.to_latex outright and remove in 3.0?

I'd like to understand if there are arguments to DataFrame.to_latex that would be difficult / verbose to replicate using Styler.to_latex.

There are other methods like DataFrame.to_excel that also have Styler.to_excel. In my opinion these DataFrame methods give visibility and convenience to users, while the Styler allows for a more configurable and verbose use case. That feels like the right API to me.

rhshadrach avatar Nov 22 '22 01:11 rhshadrach

For @rhshadrach here is a list of the DataFrame.to_latex arguments and their corresponding treatment with Styler:

  • columns: parsed to Styler.hide to select which columns to render. Styler version also has more features.
  • col_space: removed, whitespace in LaTeX serves no purpose, it is removed by LaTeX renderer.
  • header: parsed to Styler.relabel_axis or Styler.hide. Styler version also has more features.
  • index, index_names: parsed to Styler.hide to select what to show or not.
  • na_rep, formatters, float_format, escape, decimal: parsed to Styler.format args which again has more customisability and options. This was tricky to code because the signatures are different in the way they are treated in each version, so we essentially need a parser and to re-input. A human can readily do this parsing and the input to Styler is not more verbose, perhaps less so in some cases.
  • encoding and buf: passed directly to Styler.to_latex
  • sparsify, multicolumn, multicolumn_format multirow: parsed to Styler.to_latex options sparse_columns sparse_index multicol_align multirow_align, again more options.
  • caption label position, all passed through to Styler.to_latex equivalents.
  • longtable: parsed to environment in Styler.to_latex with more features.
  • bold_rows: encodes a style to the index values and then renders Styler with that specific style.
  • Note the additional kwargs in Styler.to_latex are: position_float, convert_css hrules (which Dataframe.to_latex renders by default) clines (with options)

Styler.to_latex is also compatible with Styler.concat for combining and printing multiple dataframes.

attack68 avatar Nov 22 '22 06:11 attack68

For @mroeschke, I think 3 things need documenting.

  • [x] that jinja2 is a requirement for this
  • [x] that _repr_latex_ does behave differently in jupyetr notebooks
  • [x] that the pandas.options have styler variants that will no longer apply for dataframe.to_latex.
  • [x] recommending styler.to_latex as the option (although I think is documented in a few places already)

I think @rhshadrach is right that there are convenience methods for Dataframe.to_latex and .to_html. There is some good info on this in #48080, regarding why the DataFrame versions are out of date.

attack68 avatar Nov 22 '22 09:11 attack68

Okay thanks for the info @attack68. Agreed those points should be made clear in the whatnew in conjunction with making this change.

that the pandas.options have styler variants that will no longer apply for dataframe.to_latex.

Can those options be removed outright or are they used in Styler.to_latex?

mroeschke avatar Nov 22 '22 22:11 mroeschke

Also as a reminder. Could you remove any potential FutureWarnings that are filtered out in the test suite due to the original deprecation?

mroeschke avatar Jan 07 '23 01:01 mroeschke

Just noting that during yesterday's dev call that we decided that this PR isn't necessarily a blocker for releasing 2.0 i.e. this FutureWarning will persist until 3.0 if not ready.

That being said this PR seems close

mroeschke avatar Jan 12 '23 23:01 mroeschke

Just noting that during yesterday's dev call that we decided that this PR isn't necessarily a blocker for releasing 2.0 i.e. this FutureWarning will persist until 3.0 if not ready.

That being said this PR seems close

Yes, if I can just get this to green I dont think there is more to do.

Post PR clean up, such as removal of redundant code and checking the filters can be done after 2.0. WHilst not a blocker I still it would be good to get in if poss.

attack68 avatar Jan 13 '23 07:01 attack68

@ivanovmg @rhshadrach @mroeschke this is greenish now. I believe the http doc build is unrelated. Please re look and consider if all your comments are addressed

attack68 avatar Jan 18 '23 06:01 attack68

Thanks for the great work here @attack68!

So to clarify the follow ups?

  1. Removing any remaining, related warning filtering
  2. Removing the display options that are no longer relevant?

mroeschke avatar Jan 19 '23 20:01 mroeschke

Yes and remove the redundant LatexFormatter code

attack68 avatar Jan 19 '23 20:01 attack68