polars
polars copied to clipboard
Printing glimpse
Problem description
Hi
I'm really excited about the glimpse
method for Python -- thanks! However, on My Machines the output prints "verbatim", that is with \n
instead of newlines.
Here is the example from the docs:
>>> import polars as pl
>>> from datetime import date
>>> df = pl.DataFrame(
... {
... "a": [1.0, 2.8, 3.0],
... "b": [4, 5, None],
... "c": [True, False, True],
... "d": [None, "b", "c"],
... "e": ["usd", "eur", None],
... "f": [date(2020, 1, 1), date(2021, 1, 2), date(2022, 1, 1)],
... }
... )
>>> df
shape: (3, 6)
┌─────┬──────┬───────┬──────┬──────┬────────────┐
│ a ┆ b ┆ c ┆ d ┆ e ┆ f │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ i64 ┆ bool ┆ str ┆ str ┆ date │
╞═════╪══════╪═══════╪══════╪══════╪════════════╡
│ 1.0 ┆ 4 ┆ true ┆ null ┆ usd ┆ 2020-01-01 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2.8 ┆ 5 ┆ false ┆ b ┆ eur ┆ 2021-01-02 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3.0 ┆ null ┆ true ┆ c ┆ null ┆ 2022-01-01 │
└─────┴──────┴───────┴──────┴──────┴────────────┘
>>> df.glimpse()
'Rows: 3\nColumns: 6\n$ a <Float64> 1.0, 2.8, 3.0 \n$ b <Int64> 4, 5, None \n$ c <Boolean> True, False, True \n$ d <Utf8> None, b, c \n$ e <Utf8> usd, eur, None \n$ f <Date> 2020-01-01, 2021-01-02, 2022-01-01 \n'
>>> "e": ["usd", "eur", None],
Wrapping the method in a print
gives the desired formatting.
>>> print(df.glimpse())
Rows: 3
Columns: 6
$ a <Float64> 1.0, 2.8, 3.0
$ b <Int64> 4, 5, None
$ c <Boolean> True, False, True
$ d <Utf8> None, b, c
$ e <Utf8> usd, eur, None
$ f <Date> 2020-01-01, 2021-01-02, 2022-01-01
I don't know if this is be design, but since I'm using glimpse
only for exploring dataframes it would be nice to avoid the print
:-)
I'd say the same thing about describe_optimized_plan
Hello, thank you for the report. I contributed glimpse
in #5622 .
First version had print
embedded, but it was decided to allow extra flexibility by returning just a string. We should be able to come up with a better way to make both ways of interacting with glimpse
and others well.
@zundertj I agree. Maybe we could add an extra argument that controls the return type? Or we make a custom string type that always is pretty printed in jupyter?
The options I could come up so far:
- add a
return_as_string
parameter, default toFalse
(otherwise annoying for interactive use). IfFalse
, print, ifTrue
, return as string - Same as 1., but with the default as
True
. I think less attractive,glimpse
is probably more intended for interactive use vs non-interactive, as the opening post here requests. - Return a custom type that prints nicely. It would need to set
__repr__
to print:
class CustomString:
def __init__(self, contents: str):
self.contents = contents
def __str__(self):
return self.contents
def __repr__(self):
print(self.contents.strip("\n"), end=None)
return ""
In the Python terminal:
>>> pl.DataFrame({"a": [1,2]}).glimpse()
Rows: 2
Columns: 1
$ a <Int64> 1, 2
>>> out = pl.DataFrame({"a": [1,2]}).glimpse()
>>> out
Rows: 2
Columns: 1
$ a <Int64> 1, 2
>>> str(out)
'Rows: 2\nColumns: 1\n$ a <Int64> 1, 2
It is not nice though, __repr__
should be use to provide an unambiguous representation of the object, feels like there should be a better way to do this. Haven't found one so far, which is odd, as I think more libraries struggle with this.
Suggestions are welcome.
I think the first option with return_as_string
sounds like a good solution.