great-tables
great-tables copied to clipboard
epic: Handle nested data in polars columns
Polars supports nested data---such as lists and structs---in columns of data.
Here's an example from the polars guide
import polars as pl
url = "https://theunitedstates.io/congress-legislators/legislators-historical.csv"
dtypes = {
"first_name": pl.Categorical,
"gender": pl.Categorical,
"type": pl.Categorical,
"state": pl.Categorical,
"party": pl.Categorical,
}
dataset = pl.read_csv(url, dtypes=dtypes).with_columns(
pl.col("birthday").str.to_date(strict=False)
)
q = (
dataset.lazy()
.group_by("first_name")
.agg(
pl.count(),
pl.col("gender"),
pl.first("last_name"),
)
.sort("count", descending=True)
.limit(5)
)
df = q.collect()
print(df)
┌────────────┬───────┬───────────────────┬───────────┐
│ first_name ┆ count ┆ gender ┆ last_name │
│ --- ┆ --- ┆ --- ┆ --- │
│ cat ┆ u32 ┆ list[cat] ┆ str │
╞════════════╪═══════╪═══════════════════╪═══════════╡
│ John ┆ 1256 ┆ ["M", "M", … "M"] ┆ Walker │
│ William ┆ 1022 ┆ ["M", "M", … "M"] ┆ Few │
│ James ┆ 714 ┆ ["M", "M", … "M"] ┆ Armstrong │
│ Thomas ┆ 454 ┆ ["M", "M", … "M"] ┆ Tucker │
│ Charles ┆ 439 ┆ ["M", "M", … "M"] ┆ Carroll │
└────────────┴───────┴───────────────────┴───────────┘
Note that each entry in the gender column is a list of strings. However, I don't think Great Tables is set up to handle this situation.
Current Behavior
from great_tables import GT
GT(df).render("html")
ComputeError: cannot cast List type (inner: 'Categorical(Some(global))', to: 'Utf8')
Note that List columns can have a schema like List[List[int]], so we need some approach that can handle lists of lists (of lists) etc.. It appears that structs are straightforward to coerce (although a struct can have a list in it).
Is there a good polars approach for casting List[List[int]] -> String?