polars
polars copied to clipboard
Add ability to specify static column names to DataFrame.pivot
Problem description
Hi, thanks for awesome lib!
I am trying to rewrite my Spark codebase to Polars in one of my projects. In Spark, I can use the pivot function by specifying a static list of columns. For example:
Example:
df.groupby("col_key_1", "col_key_2")
.pivot(
pivot_col="Name of the column to pivot.",
values=["List of values that will be translated to columns in the output DataFrame."])
.agg(fns.first("value_col_name"))
This allowed Spark to understand that the pivot_col
should be included in the logical plan. To support lazy execution without checking out the data, the user could provide values as a static list of strings.
In Polars, I need to have the values for the new columns and values for these columns in the same DataFrame
. In my case, it is not very effective because I have to join the two tables before using pivot
. It will be great If Polars could use a list of strings with column names in the pivot
method. In this case two goals could be achieved:
- Users with use-cases similar to mine (when data is stored in a key-value table and the names of these keys are in a different table) would be able to avoid unnecessary joins.
- Polars would be able to use
pivot
inLazyFrame
because it would know the names of the new columns before plan execution.
Thank you!