great-tables
great-tables copied to clipboard
feat: draft allow data_color to take a palette function
This is a very rough draft of what data_color might look like, if the palette argument accepted an arbitrary function, mapping values -> hex colors.
Example:
from great_tables import GT, exibble
small_exibble = exibble[["date", "currency"]]
GT(small_exibble).data_color("currency", palette=lambda vals: ["#FFFFFF" if x > 0 else "#000000" for x in vals])
Note two important features of the current implementation:
- the function takes a list of values, and returns a list of strings
- the function must return hex strings (e.g.
#FFFFFF)
I wonder if there's a nice way to work in polars expressions? The challenge is there's no this expression in polars. We could add a surrogate columns, like _this_ etc.. This would allow people to use pl.when(pl.col("_this_") ...).then(...). However, it doesn't seem ideal.....
An alternative to this might be users using GT.tab_style and passing a polars expression to style.fill() etc...
Edit: I wonder if a nice move could be...
- Whenever palette is a polars expression, then
- Select the columns and rows specified to data color (currently, only columns is supported) as a polars DataFrame
- Run the expression on the DataFrame. The result must be...
- a DataFrame of the same dimensions.
- each value is a hex string or null
- Use the result as the color values
(In a sense, this means that polars.selectors.all() is equivalent to a this construct)
Big questions
- Does this play well with existing color palette tools?
matplotlib.cm.coolwarm()and friends return an array of N_obs x 4 (rgba values). We don't necessarily need to support it, but I'm curious what else is out there!
I think we ought to support at least hex colors (in all their variations, I think we have regex functions to check for the different representations) and rgba. I'm seeing a lot more of the latter on GitHub mostly in palette repos. Then normalize to hex (I don't believe any of this is lossy).
Also, I love the surrogate _this_ column idea.
OTOH, thinking about the requirement for hex colors in the return of the callable, it would be interesting to have the option to perform the validation as a default, but also have the other option to turn that validation off. Only because you can do some pretty sophisticated things with color in HTML like define gradients and even use animation (and this would definitely fail the proposed validation check). Just more food for thought.
Re: what else is out there
colorcet (seems to also be available via matplotlib. claims to provide better best colormaps for continuous data) https://colorcet.holoviz.org/user_guide/index.html
I think it would be really great to be able to do things like this:
https://matplotlib.org/stable/users/explain/colors/colormapnorms.html
One could do that by passing a callable that handles both normalization (mapping values to [0,1]) and the color map (numeric -> hex). Maybe we want to allow that, but in general it is nice if these things are separate, so you can outsource one or the other to pre-existing collections of color maps and normalization tools!
If you are interested in going that route, it seems like the current domain argument overlaps with what matplotlib normalization is trying to do.
Edit: I guess this is similar to transforms in mizani, which i see you are using already.
https://mizani.readthedocs.io/en/stable/transforms.html
This is how I would use matplotlib normalizers + a cmap to get hex values:
from typing import Callable
import matplotlib.colors as mcolors
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
def _color_series(
values: pd.Series,
colormap: plt.cm,
norm: mcolors.Normalize | Callable | None = None,
na_color: str | None = None,
) -> pd.Series:
"""
Color a Series of numeric values using a Matplotlib colormap and normalization.
Parameters:
values : pd.Series
The Series to color.
colormap : plt.cm
The Matplotlib colormap to use.
norm : mcolors.Normalize | Callable | None
A matplotlib normalization object (e.g., Normalize, LogNorm) or a function that returns
normalized valeus within the range [0, 1]. If None, defaults to linear normalization
between the minimum and maximum values.
na_color : str, optional
The color to use for NaN values.
Returns:
pd.Series
A new Series with hex color strings.
"""
# Normalize the data
if norm is None: # Defaults to linear normalization between min and max
norm = mcolors.Normalize(values.min(), values.max())
normalized = norm(values.values)
colors = colormap(normalized)
# Convert RGBA colors to hex, handling NaN values
hex_colors = [mcolors.to_hex(color) if not np.isnan(color[0]) else na_color for color in colors]
return pd.Series(hex_colors, index=values.index)
I have also attached some example use cases to show off where they can be useful.
EDIT: can add the alpha like this, setting the -th column of RGBA color rep and using the keep_alpha=True arg in to_hex.
normalized = norm(values.values)
colors = colormap(normalized)
colors[:, -1] = alpha # in RGBA format, so -1th column is alpha
# Convert RGBA colors to hex, handling NaN values
hex_colors = [mcolors.to_hex(color, keep_alpha=True) if not np.isnan(color[0]) else na_color for color in colors]