bionic icon indicating copy to clipboard operation
bionic copied to clipboard

Unify `gather` and `get` APIs

Open jqmp opened this issue 6 years ago • 0 comments
trafficstars

Entity values in Bionic can be accessed in two ways: from another entity definition, or directly from a flow with Flow.get. Currently these two APIs offer different ways to transform and aggregate the values.

When accessing entity values from the definition of another entity, there's a powerful @gather decorator that joins many values of many entities into a single dataframe. When using Flow.get, one can only aggregate the values of a single entity. However, get provides other options for accessing serialized values, which aren't available from within entity definitions.

These discrepancies make certain constructions impossible, and they also make the system harder to learn. In particular, new users often find it difficult to understand the @gather decorator -- I believe this is partly because there's no way to access gathered values directly from a flow, which makes it hard to experiment.

We should move towards a unified API where the same functionality is available in both circumstances. Perhaps something like these:

@builder
@bn.arg('gather_df', ['hyperparams', 'model_type'], also='performance', collection='frame')
def best_performance(gather_df):
    ...
@builder
@bn.arg('big_dataframe_filename', 'big_dataframe', mode='filename'):
def big_dataframe_filesize(big_dataframe_filename):
    return os.stat(big_dataframe_filename).st_size
flow.get(['hyperparams', 'model_type'], also='performance', collection='frame')

jqmp avatar Oct 30 '19 19:10 jqmp