ArcticDB
ArcticDB copied to clipboard
Allow QueryBuilder Aggregates to be Applied to Whole Columns
A simple example would be to get the max value in a column without reading all the data.
Missing data (NaNs) should be ignored.
The current workaround is to create a synthetic column with a fixed value and then groupby the new column and apply the aggregator.
This works well but the syntax is not clear enough.
An example of the workaround is
np.random.seed(13)
qb_whole_col_df = pd.DataFrame(data={'val': np.random.uniform(0., 100., 25)})
qb_whole_col_sym = 'qb_whole_col_sym'
lib.write(qb_whole_col_sym, qb_whole_col_df)
q_wc = adb.QueryBuilder()
q_wc = q_wc.apply('zero', q_wc['val']*0).groupby('zero').agg({'val': 'max'})
lib.read(qb_whole_col_sym, query_builder=q_wc).data
In future we will make this possible with cleaner syntax.