polars icon indicating copy to clipboard operation
polars copied to clipboard

Add ability to specify static column names to DataFrame.pivot

Open ilya-kozyrev opened this issue 2 years ago • 0 comments

Problem description

Hi, thanks for awesome lib!

I am trying to rewrite my Spark codebase to Polars in one of my projects. In Spark, I can use the pivot function by specifying a static list of columns. For example:

Example:

df.groupby("col_key_1", "col_key_2")
    .pivot(
       pivot_col="Name of the column to pivot.", 
       values=["List of values that will be translated to columns in the output DataFrame."])
    .agg(fns.first("value_col_name"))

This allowed Spark to understand that the pivot_col should be included in the logical plan. To support lazy execution without checking out the data, the user could provide values as a static list of strings.

In Polars, I need to have the values for the new columns and values for these columns in the same DataFrame. In my case, it is not very effective because I have to join the two tables before using pivot. It will be great If Polars could use a list of strings with column names in the pivot method. In this case two goals could be achieved:

  1. Users with use-cases similar to mine (when data is stored in a key-value table and the names of these keys are in a different table) would be able to avoid unnecessary joins.
  2. Polars would be able to use pivot in LazyFrame because it would know the names of the new columns before plan execution.

Thank you!

ilya-kozyrev avatar Jan 19 '23 15:01 ilya-kozyrev