polars icon indicating copy to clipboard operation
polars copied to clipboard

Change default of `Config.set_fmt_str_lengths` to be more reasonable (also little display "bug")

Open JulianCologne opened this issue 2 months ago • 4 comments

Description

In my experience the default of 14(??) seems to be super low and not very usable in almost all real life use-cases.

I propose to change this to a much bigger value like 50 (see pandas example).

Polars

  • only showing 14(?) characters
  • also second line the are "incorrect"! The text is finished but it is showing the ⚠️ ⚠️
pl.DataFrame(
    {
        "txt": [
            "123456789_123",
            "123456789_1234",
            "123456789_12345",
        ],
    },
)

image

Pandas

  • much more reasonable 50 characters by default
  • typical pandas stuff: when 50 is reached trucate to 46 chars 🤣 🤣
pd.DataFrame(
    {
        "txt": [
            "123456789_123456789_123456789_123456789_123456789",
            "123456789_123456789_123456789_123456789_123456789_",
        ]
    }
)

image

JulianCologne avatar Apr 23 '24 07:04 JulianCologne

I find 50 to be a bit much, but 30 to be a good spot.

lyngc avatar Apr 23 '24 11:04 lyngc

You can set this for yourself by setting an env variable called POLARS_FMT_STR_LEN to be whatever you want it to be.

@stinodego This feels enough like a not planned that I'm closing but just in case I'm overstepping I'm tagging you so you can reopen.

deanm0000 avatar Apr 26 '24 14:04 deanm0000

@deanm0000 Julian has had other defaults changed previously e.g. #14515

In this case it seems like it's 15 for Series and 32 for DataFrame?

Having them both be the same value seems reasonable?

https://github.com/pola-rs/polars/blob/9b0503a6559ffdb952b27a22b2c2eebb58ea9553/crates/polars-core/src/fmt.rs#L107

https://github.com/pola-rs/polars/blob/9b0503a6559ffdb952b27a22b2c2eebb58ea9553/crates/polars-core/src/fmt.rs#L505

pl.Series(
    "txt", [
        "Play it, Sam. Play 'As Time Goes By'.",
        "This is the beginning of a beautiful friendship.",
    ]
)

# Series: 'txt' [str]
# [
# 	"Play it, Sam. …
# 	"This is the be…
# ]
pl.DataFrame({
    "txt": [
        "Play it, Sam. Play 'As Time Goes By'.",
        "This is the beginning of a beautiful friendship.",
    ]
})

# shape: (2, 1)
# ┌───────────────────────────────────┐
# │ txt                               │
# │ ---                               │
# │ str                               │
# ╞═══════════════════════════════════╡
# │ Play it, Sam. Play 'As Time Goes… │
# │ This is the beginning of a beaut… │
# └───────────────────────────────────┘

The trailing also appears to be an off-by-1 bug.

with pl.Config(fmt_str_lengths=4): 
    pl.Series(["AB"])
    # shape: (1,)
    # Series: '' [str]
    # [
    #     "AB"
    # ]

    pl.Series(["ABC"])
    # shape: (1,)
    # Series: '' [str]
    # [
    #     "ABC…
    # ]
    # 

cmdlineluser avatar Apr 26 '24 15:04 cmdlineluser

I'll re-open this as I think there's a point here. I've also ran into this annoyance, same with the number of list items displayed. This is also tunable but a bit inconvenient.

We can at least address some inconsistencies between the HTML formatting and the repr and the off-by-one issue.

stinodego avatar Apr 26 '24 16:04 stinodego