[FEATURE] Pivot dataframe
Is your feature request related to a problem? Please describe. Pivoting is a common task in data processing. It is used when we want to calculate statistics of specific attributes.
From Pandas' documentation:

Describe the solution you'd like I've been looking to libraries capable of pivoting dataframes. Pandas, in my view, has a very complete implementation of the method.
Describe alternatives you've considered
This is a need I had at work. Since I could only find one library – AlaSQL – and with only a simple implementation, that does not pivot over multiple columns or allows pivoting without explicit index, I've written my own version of the function. To derive the algorithm, I replicated the behavior seen in the pivot_table function of Pandas.
Additional context I have already written the function, and it works for a few test cases I listed:
✓ with one value, one index, and one column (130ms)
✓ without index, aggregates values only by columns (61ms)
✓ without columns, aggregates values only by columns (69ms)
✓ with multiple values, one index, and one column (67ms)
✓ with one value, one index, and multiple columns (97ms)
✓ with one value, multiple index, and one column (84ms)
✓ with one value, multiple index, and multiple columns (177ms)
My function is not integrated into dataframe-js, but it works on dataframes generated with dataframe-js. Would you be interested in bringing this feature in? If so, I may need some help on reviewing it to ensure it follows the conventions of the project – of source code and unit tests. I could open a WIP pull request and we discuss the necessary changes on the go.
function pivotTable(dataframe, values = [], index = [], columns = []) {
if (values.length > 1) {
throw 'Not implemented';
}
const value = values[0];
const hasSelectedIndex = index && index.length;
const hasSelectedColumns = columns && columns.length;
let agg = new Map();
const aggKeys = index.concat(columns).filter((x) => x);
const groupedDF = dataframe['groupBy'](...aggKeys);
let rowKey = value;
let colKey = value;
groupedDF.aggregate(group => {
const row = group.getRow(0);
if (hasSelectedIndex) {
rowKey = row.select(...index).toArray().join('_');
}
const baseAttrs = Object.fromEntries(index.map((i) => [i, row.get(i)]));
agg.set(rowKey, agg.get(rowKey) || baseAttrs);
if (hasSelectedColumns) {
colKey = row.select(...columns).toArray().join('_');
}
agg.get(rowKey)[colKey] = group.stat.sum(value);
})
return new DataFrame(Array.from(agg.values()));
}
Hi @Irio, Thank you for your sugesstion. Sorry for the delay, I wasn't active these days.
Have you look the GroupedDataFrame .pivot method (https://github.com/Gmousse/dataframe-js/blob/develop/src/group.js#L156) ?
It should works similarly.
Hi @Gmousse,
Yes, I have look at them, but they don't seem to cover all the cases I needed and Pandas supports. There's always the chance they are too specific, so in this case, let me know and we can close this issue.
My major pain-point was doing operations with multiple attributes, either in index, columns, or the attribute to aggregate on.
Here are all the cases I implemented using dataframe-js and are supported by Pandas:
https://colab.research.google.com/drive/1tBXUtOzTZiTli2JR1VYFODYf7wxO6PNF
And here, my attempt to reproduce them using [email protected]. The only case that can be fully reproduced, at least with my understanding of the docs, is the first.
https://gist.github.com/Irio/c23af8d00768d48acecb87a41d62905e