vaex
vaex copied to clipboard
[BUG-REPORT] Large Groupby Agg runs out of memory
Description First thank you guys for this wonderful library. It does many pd operations pretty well given mem constraints (except maybe cumsum() which i am eagerly waiting.) I have a arrow file ~8GB which i load in vaex df of shape: (27_416_244, 32). System avlbl RAM: ~8GB. I do a group_agg like this:
#summary_df is a multi index pandas df with 76k rows, 20 cols
index_names = list(summary_df.index.names)
strfmt = '%Y-%m-%d'
vdf['_Period'] = vdf['Date'].dt.strftime(strfmt)
gd_column_ops_map = {
'PnL % Capital':'sum', 'PnL':'sum', '% High':'mean',
'% Close':'mean', '% Low':'mean', 'Charges':'sum', 'Sell Val':'sum', 'Buy Val':'sum',
'Qty':'sum', 'Cash Flow':'sum'
}
grpby_cols = index_names + ['_Period']
>> [Kernel CRASHES in next line after grpby happens perhaps in agg]
grp_trades_vdf = vdf.groupby(grpby_cols, progress=True).agg(gd_column_ops_map)
Software information
- Vaex version
{'vaex': '4.17.0',
'vaex-core': '4.17.1',
'vaex-viz': '0.5.4',
'vaex-hdf5': '0.14.1',
'vaex-server': '0.9.0',
'vaex-astro': '0.9.3',
'vaex-jupyter': '0.8.2',
'vaex-ml': '0.18.3'}
python: 3.10
- Vaex was installed via: pip
- OS: Ubuntu 22