vaex icon indicating copy to clipboard operation
vaex copied to clipboard

[BUG-REPORT] Large Groupby Agg runs out of memory

Open meta-ks opened this issue 8 months ago • 0 comments

Description First thank you guys for this wonderful library. It does many pd operations pretty well given mem constraints (except maybe cumsum() which i am eagerly waiting.) I have a arrow file ~8GB which i load in vaex df of shape: (27_416_244, 32). System avlbl RAM: ~8GB. I do a group_agg like this:

#summary_df is a multi index pandas df with 76k rows, 20 cols
index_names = list(summary_df.index.names)
strfmt = '%Y-%m-%d'
vdf['_Period'] = vdf['Date'].dt.strftime(strfmt)

gd_column_ops_map = {
    'PnL % Capital':'sum', 'PnL':'sum', '% High':'mean',
    '% Close':'mean', '% Low':'mean', 'Charges':'sum', 'Sell Val':'sum', 'Buy Val':'sum',
    'Qty':'sum', 'Cash Flow':'sum'
}
grpby_cols = index_names + ['_Period']

>> [Kernel CRASHES in next line after grpby happens perhaps in agg]
 grp_trades_vdf = vdf.groupby(grpby_cols, progress=True).agg(gd_column_ops_map)

Software information

  • Vaex version
{'vaex': '4.17.0',
'vaex-core': '4.17.1',
'vaex-viz': '0.5.4',
'vaex-hdf5': '0.14.1',
'vaex-server': '0.9.0',
'vaex-astro': '0.9.3',
'vaex-jupyter': '0.8.2',
'vaex-ml': '0.18.3'}
python: 3.10
  • Vaex was installed via: pip
  • OS: Ubuntu 22

meta-ks avatar Nov 01 '23 04:11 meta-ks