feast icon indicating copy to clipboard operation
feast copied to clipboard

On demand feature views (ODFVs) should use support python dicts

Open adchia opened this issue 2 years ago • 5 comments

In some test benchmarks, using regular python dicts for inputs for executing the transformations is much faster (up to ~10x) than pandas for the online flow. This tends to be the more latency sensitive flow (offline flows seem to be ~40% slower if using vectorized operations).

Something that looks like:

@on_demand_feature_view(
    sources=[driver_hourly_stats_view, val_to_add_request],
    schema=[
        Field(name="conv_rate_plus_val1", dtype=Float64),
        Field(name="conv_rate_plus_val2", dtype=Float64),
    ],
    mode="python"
)
def transformed_conv_rate(driver_hourly_stats: Dict[str, Any], vals_to_add: Dict[str, Any]) -> Dict[str, Any]:
    features = {}
    features['conv_rate_plus_val1'] = (driver_hourly_stats['conv_rate'] + vals_to_add['val_to_add'])
    features['conv_rate_plus_val2'] = (driver_hourly_stats['conv_rate'] + vals_to_add['val_to_add_2'])
    return features

might be similar to what we want

adchia avatar Jan 31 '22 17:01 adchia

In your example what types would the driver_hourly_stats['conv_rate'] be? Would it be a list? If so this interface will require some additional transformation. Would it be better to pass Numpy Arrays? If so have you factored the instantiation of these into the ~10x speed up?

judahrand avatar Feb 01 '22 21:02 judahrand

Numpy arrays should also be a lot better yeah.

It might make sense to support both, but really there's also the factor of what the user will have access to at serving time. Seems more likely to be a standard dict. numpy should def be faster, but I also worry since it's significantly more verbose.

Could also see a world where we allow both pandas or dicts since pandas will be easier to write the transformations but less performant.

In this specific situation, those conv_rate values with be individual doubles.

adchia avatar Feb 14 '22 16:02 adchia

I was actually thinking a dict of 1d Numpy arrays?

judahrand avatar Feb 14 '22 16:02 judahrand

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 20 '22 19:09 stale[bot]

tagging @maksstach who has implemented a version of this on our fork

franciscojavierarceo avatar Sep 23 '23 09:09 franciscojavierarceo

I did this one https://github.com/feast-dev/feast/pull/4045 next I'll do writes.

franciscojavierarceo avatar Mar 30 '24 10:03 franciscojavierarceo