polars icon indicating copy to clipboard operation
polars copied to clipboard

Printing glimpse

Open robertdj opened this issue 2 years ago • 5 comments

Problem description

Hi

I'm really excited about the glimpse method for Python -- thanks! However, on My Machines the output prints "verbatim", that is with \n instead of newlines. Here is the example from the docs:

>>> import polars as pl
>>> from datetime import date
>>> df = pl.DataFrame(
...     {
...         "a": [1.0, 2.8, 3.0],
...         "b": [4, 5, None],
...         "c": [True, False, True],
...         "d": [None, "b", "c"],
...         "e": ["usd", "eur", None],
...         "f": [date(2020, 1, 1), date(2021, 1, 2), date(2022, 1, 1)],
...     }
... )
>>> df
shape: (3, 6)
┌─────┬──────┬───────┬──────┬──────┬────────────┐
│ a   ┆ b    ┆ c     ┆ d    ┆ e    ┆ f          │
│ --- ┆ ---  ┆ ---   ┆ ---  ┆ ---  ┆ ---        │
│ f64 ┆ i64  ┆ bool  ┆ str  ┆ str  ┆ date       │
╞═════╪══════╪═══════╪══════╪══════╪════════════╡
│ 1.0 ┆ 4    ┆ true  ┆ null ┆ usd  ┆ 2020-01-01 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2.8 ┆ 5    ┆ false ┆ b    ┆ eur  ┆ 2021-01-02 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3.0 ┆ null ┆ true  ┆ c    ┆ null ┆ 2022-01-01 │
└─────┴──────┴───────┴──────┴──────┴────────────┘
>>> df.glimpse()
'Rows: 3\nColumns: 6\n$ a <Float64> 1.0, 2.8, 3.0                                                                             \n$ b   <Int64> 4, 5, None                                                                                \n$ c <Boolean> True, False, True                                                                         \n$ d    <Utf8> None, b, c                                                                                \n$ e    <Utf8> usd, eur, None                                                                            \n$ f    <Date> 2020-01-01, 2021-01-02, 2022-01-01                                                        \n'
>>>         "e": ["usd", "eur", None],

Wrapping the method in a print gives the desired formatting.

>>> print(df.glimpse())
Rows: 3
Columns: 6
$ a <Float64> 1.0, 2.8, 3.0                                                                             
$ b   <Int64> 4, 5, None                                                                                
$ c <Boolean> True, False, True                                                                         
$ d    <Utf8> None, b, c                                                                                
$ e    <Utf8> usd, eur, None                                                                            
$ f    <Date> 2020-01-01, 2021-01-02, 2022-01-01                                                        

I don't know if this is be design, but since I'm using glimpse only for exploring dataframes it would be nice to avoid the print:-)

robertdj avatar Dec 19 '22 07:12 robertdj

I'd say the same thing about describe_optimized_plan

braaannigan avatar Dec 20 '22 14:12 braaannigan

Hello, thank you for the report. I contributed glimpse in #5622 .

First version had print embedded, but it was decided to allow extra flexibility by returning just a string. We should be able to come up with a better way to make both ways of interacting with glimpse and others well.

zundertj avatar Dec 20 '22 22:12 zundertj

@zundertj I agree. Maybe we could add an extra argument that controls the return type? Or we make a custom string type that always is pretty printed in jupyter?

ritchie46 avatar Dec 21 '22 15:12 ritchie46

The options I could come up so far:

  1. add a return_as_string parameter, default to False (otherwise annoying for interactive use). If False, print, if True, return as string
  2. Same as 1., but with the default as True. I think less attractive, glimpse is probably more intended for interactive use vs non-interactive, as the opening post here requests.
  3. Return a custom type that prints nicely. It would need to set __repr__ to print:
class CustomString:
      def __init__(self, contents: str):
          self.contents = contents

      def __str__(self):
          return self.contents

      def __repr__(self):
          print(self.contents.strip("\n"), end=None)
          return ""

In the Python terminal:

>>> pl.DataFrame({"a": [1,2]}).glimpse()
Rows: 2
Columns: 1
$ a <Int64> 1, 2                                                                                        

>>> out = pl.DataFrame({"a": [1,2]}).glimpse()
>>> out
Rows: 2
Columns: 1
$ a <Int64> 1, 2                                                                                        

>>> str(out)
'Rows: 2\nColumns: 1\n$ a <Int64> 1, 2   

It is not nice though, __repr__ should be use to provide an unambiguous representation of the object, feels like there should be a better way to do this. Haven't found one so far, which is odd, as I think more libraries struggle with this.

Suggestions are welcome.

zundertj avatar Dec 23 '22 09:12 zundertj

I think the first option with return_as_string sounds like a good solution.

robertdj avatar Dec 23 '22 13:12 robertdj