great-tables icon indicating copy to clipboard operation
great-tables copied to clipboard

epic: Handle nested data in polars columns

Open machow opened this issue 7 months ago • 1 comments

Polars supports nested data---such as lists and structs---in columns of data.

Here's an example from the polars guide

import polars as pl

url = "https://theunitedstates.io/congress-legislators/legislators-historical.csv"

dtypes = {
    "first_name": pl.Categorical,
    "gender": pl.Categorical,
    "type": pl.Categorical,
    "state": pl.Categorical,
    "party": pl.Categorical,
}

dataset = pl.read_csv(url, dtypes=dtypes).with_columns(
    pl.col("birthday").str.to_date(strict=False)
)

q = (
    dataset.lazy()
    .group_by("first_name")
    .agg(
        pl.count(),
        pl.col("gender"),
        pl.first("last_name"),
    )
    .sort("count", descending=True)
    .limit(5)
)

df = q.collect()
print(df)
┌────────────┬───────┬───────────────────┬───────────┐
│ first_name ┆ count ┆ gender            ┆ last_name │
│ ---        ┆ ---   ┆ ---               ┆ ---       │
│ cat        ┆ u32   ┆ list[cat]         ┆ str       │
╞════════════╪═══════╪═══════════════════╪═══════════╡
│ John       ┆ 1256  ┆ ["M", "M", … "M"] ┆ Walker    │
│ William    ┆ 1022  ┆ ["M", "M", … "M"] ┆ Few       │
│ James      ┆ 714   ┆ ["M", "M", … "M"] ┆ Armstrong │
│ Thomas     ┆ 454   ┆ ["M", "M", … "M"] ┆ Tucker    │
│ Charles    ┆ 439   ┆ ["M", "M", … "M"] ┆ Carroll   │
└────────────┴───────┴───────────────────┴───────────┘

Note that each entry in the gender column is a list of strings. However, I don't think Great Tables is set up to handle this situation.

Current Behavior

from great_tables import GT

GT(df).render("html")
ComputeError: cannot cast List type (inner: 'Categorical(Some(global))', to: 'Utf8')

machow avatar Jan 16 '24 17:01 machow