polars icon indicating copy to clipboard operation
polars copied to clipboard

Remove or suppress surrounding quotes from text in <td> elements of _repr_html() output

Open SummittDweller opened this issue 2 years ago • 4 comments

Problem description

I first posted this issue in StackOverflow as suggested. It reads...

I have a Python Flask app that recently switched from using Pandas to Polars for some dataframe handling. The pertinent code is shown here:

data = { 'Text': ['Virginia Woolf, Mrs. Dalloway', 'College website corpus', 'Presidential inaugural speeches', 'Federalist Papers', 'British Novels', 'YOUR TEXT'], 
             'To Be Frequency': [28.3, 16.7, 31.8, 39.8, 31.4, results[1]] }
df = pd.from_dict(data)

# textresult = (df.sort_values(by=['To Be Frequency'], ascending=False)).style   # old Pandas code

# See https://pola-rs.github.io/polars/py-polars/html/reference/config.html for complete list of Polars.Config settings
    pd.Config.set_tbl_hide_column_data_types(True)
    pd.Config.set_tbl_hide_dataframe_shape(True)
    pd.Config.set_fmt_str_lengths(40)
    pd.Config.set_tbl_width_chars(200)

textresult = df.sort( 'To Be Frequency' )._repr_html_( )  # convert the result to HTML because a simple string won't do

The _repr_html( ) function appears to surround my dataframe strings with double quotes, rendered as escaped " characters, so that I get HTML output in my Python Flask templates like the attached screen capture.

Screenshot 2023-01-27 at 14 10 47

I would prefer to see the same but without the quotes surrounding my "Text" names.

I've applied a quick fix to my code by adding htmlresult = textresult.replace( "&quot;", "") before htmlresult is rendered by my template. However, it would be nice to be more selective with an option to suppress the creation of additional quotes in the _repr_html() function.

I've already forked and cloned the Polars codebase where I added the following test to my local py-polars/tests/unit/test_df.py:

def test_to_html_quoted(df: pl.DataFrame) -> None:
    # check it does not panic/error, and returns a table with NO quotes around the <td> text
    data = { 'Text with NO "quotes" please': [' Quotes? '], 
	         'Just a Number': [420] }
    df = pl.from_dict(data)
    html = df._repr_html_()
    assert '&quot; Quotes? ' not in html

I also tested several simple modifications to py-polars/polars/_html.py around line numbers 101 thru 172, but I don't know @HTMLFormatter or Polars well enough yet to come up with an elegant solution that works.

SummittDweller avatar Jan 27 '23 20:01 SummittDweller

This is caused by HTMLFormatter.write_body which calls PySeries::get_fmt

Coming from Pandas, I also find the quotes in Jupyter very noisy (and inconsistent with the plain text display). It'd be nice to have an Config option to omit this.

Related: #10646, #10648

gusutabopb avatar Dec 28 '23 05:12 gusutabopb

Hi! I would like to contribute to this one. The quotes personally bug me a bit and after reviewing the references by @gusutabopb I think I have enough understanding to resolve it.

My plan is basically:

  1. add a configuration called boolean POLARS_FMT_STR_QUOTES around the same areas where POLARS_FMT_STR_LEN configuration is , defaulting to false
  2. edit HTMLFormatter.write_body to read in the config and pass it on to Pyseries::get_fmt
  3. edit Pyseries::get_fmt

did I miss anything?

pedroangelini avatar Apr 30 '24 20:04 pedroangelini

I know that POLARS_VERBOSE goes by "0" or "1" not "true" or "false" so that's a precedent to follow.

deanm0000 avatar Apr 30 '24 21:04 deanm0000

I know that POLARS_VERBOSE goes by "0" or "1" not "true" or "false" so that's a precedent to follow.

Yep, that was exactly where I was working now!

I'm basing it off of set_tbl_column_data_type_inline, in the config.py file . The python API accepts a bool, and eventually turns it into a 0/1 as it sets the environment variable.

pedroangelini avatar Apr 30 '24 21:04 pedroangelini