danfojs icon indicating copy to clipboard operation
danfojs copied to clipboard

Pivot table

Open risenW opened this issue 4 years ago • 7 comments

Pivot table support for DataFrame

risenW avatar Aug 13 '20 12:08 risenW

Any updates on this? Seems like it would be a compelling feature to make this the default pandas go-to in the javascript ecosystem.

nite avatar Jul 16 '21 08:07 nite

Any updates on this? Seems like it would be a compelling feature to make this the default pandas go-to in the javascript ecosystem.

We will work on this, just not in the roadmap atm. Unless someone decides to pick it up. Would you be interested ?

risenW avatar Jul 16 '21 14:07 risenW

Point me to the relevant code & I'll take a look!

tbh this is currently a bit of a blocker to me even doing a spike with danfo - but if we do it, then I'd like to know how hard it'd be to implement a pivot.

Here's a gist I wrote to do a pivot on an array of melted objects with lodash: doubt it's this easy! https://gist.github.com/nite/6ffda3d61278dccfb2152f8565492009

nite avatar Jul 18 '21 22:07 nite

@nite i don't think it will be that hard to implement, to implement pivot_table i think we need to implement how to access and display multi-index table.

But however, we can get started without the above

But with my little knowledge of pivot table, to create the main functionality for pivot_table without including some more complicated functionality as included in pandas

The main functionality of pivot_table from pandas API pivot_table(data, values=None, index=None, columns=None, aggfunc='mean') can be implemented as follows:

  • if index is given, which will be a list of columns name. We need to group the DataFrame by each of the columns in index. Hence we can have an object containing each column and their grouby dataframe e.g {col1: df.groupby(['col1']), col2: df.grouby(['col2']) }
  • If values is not given then df.groupby([col]) for each column in index is just like grouping the whole dataframe by col but if values is given, then we are grouping the DataFrame column in values by col from index e.g {col1: groupby(['col1']).col(values), . . . .}
  • if columns is given that means we want to perform more than one column grouping on the DataFrame e.g {col1: groupby(['col', ...columns]), . . . .}. But I think instead of doing this at once like grouby('col1', ...columns]) we will need to loop through the columns like this:
for (I in columns){
  column = columns[i]
  pivotTableGraph['col1'][column] = groupby([`col1', column])
}
  • if aggfunc is given and not an array, then the operation will look like this grouby(['col']).mean() that's if we assume aggfunc is mean. But if aggfunc is given like this {col1: 'mean', col2: 'sum'} then we will use groupby(['col']).agg(aggfunc)

At the end of this operation, we would have a giant object containing the result of this operation, this object can be considered to be a graph.

To have a concrete view of the above implementation steps, you can check out pivot_table examples here: https://www.analyticsvidhya.com/blog/2020/03/pivot-table-pandas-python/ and compare them with the above implementation details.

@nite I think this is all we need to implement the main functionality of pivot_table

Cc: @risenW

steveoni avatar Jul 24 '21 17:07 steveoni

Point me to the relevant code & I'll take a look!

tbh this is currently a bit of a blocker to me even doing a spike with danfo - but if we do it, then I'd like to know how hard it'd be to implement a pivot.

Here's a gist I wrote to do a pivot on an array of melted objects with lodash: doubt it's this easy! https://gist.github.com/nite/6ffda3d61278dccfb2152f8565492009

Also to add @nite You would implement this in the DataFrame class here. Your output is going to be a DataFrame, so something of this signature:

 / **
   *Some doc here
   * @return DataFrame 
   */
  pivot() {
    const data = this.values  //get the inner array representing the DataFrame
    //your pivot code to manipulate the data
    ...
    ...
    // return a new DataFrame with the pivoted values
    const df = new DataFrame(pivoted_data, { columns: this.column_names, index: indx });
    return df;
  }

risenW avatar Jul 25 '21 08:07 risenW

Hi everyone, are there any updates on adding pivot functionality to a dataframe? I looked at the documentation and could not find anything on this matter. It would be fantastic to have pivoting in danfo.js.

rorcde avatar Jan 27 '23 12:01 rorcde