ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

Allow QueryBuilder Aggregates to be Applied to Whole Columns

Open DrNickClarke opened this issue 5 months ago • 0 comments

A simple example would be to get the max value in a column without reading all the data.

Missing data (NaNs) should be ignored.

The current workaround is to create a synthetic column with a fixed value and then groupby the new column and apply the aggregator.

This works well but the syntax is not clear enough.

An example of the workaround is

np.random.seed(13)
qb_whole_col_df = pd.DataFrame(data={'val': np.random.uniform(0., 100., 25)})
qb_whole_col_sym = 'qb_whole_col_sym'
lib.write(qb_whole_col_sym, qb_whole_col_df)
q_wc = adb.QueryBuilder()
q_wc = q_wc.apply('zero', q_wc['val']*0).groupby('zero').agg({'val': 'max'})
lib.read(qb_whole_col_sym, query_builder=q_wc).data

In future we will make this possible with cleaner syntax.

DrNickClarke avatar Sep 03 '24 16:09 DrNickClarke