feast
feast copied to clipboard
OnDemandFeatureView.feature_transformation.infer_features does pass UDF outputs to python_type_to_feast_value_type
Expected Behavior
OnDemandFeatureView.feature_transformation.infer_features
should be able to infer features from primitive python types for all supported feast data types, for all transformation backends.
Current Behavior
All on demand feature views are currently broken for list types, as there is no way to bypass schema inference.
Details
OnDemandFeatureView.feature_transformation.infer_features
can only infer features in the type map inside python_type_to_feast_value_type
, i.e.
type_map = {
"int": ValueType.INT64,
"str": ValueType.STRING,
"string": ValueType.STRING, # pandas.StringDtype
"float": ValueType.DOUBLE,
"bytes": ValueType.BYTES,
"float64": ValueType.DOUBLE,
"float32": ValueType.FLOAT,
"int64": ValueType.INT64,
"uint64": ValueType.INT64,
"int32": ValueType.INT32,
"uint32": ValueType.INT32,
"int16": ValueType.INT32,
"uint16": ValueType.INT32,
"uint8": ValueType.INT32,
"int8": ValueType.INT32,
"bool": ValueType.BOOL,
"boolean": ValueType.BOOL,
"timedelta": ValueType.UNIX_TIMESTAMP,
"timestamp": ValueType.UNIX_TIMESTAMP,
"datetime": ValueType.UNIX_TIMESTAMP,
"datetime64[ns]": ValueType.UNIX_TIMESTAMP,
"datetime64[ns, tz]": ValueType.UNIX_TIMESTAMP,
"category": ValueType.STRING,
}
This is because if the type e.g. ValueType.FLOAT_LIST
doesn't have a mapping in the dictionary above, and value is None
, then isinstance(value, dtype)
checks will fall through to the ValueError
in python_type_to_feast_value_type
.
Steps to reproduce
Initialize a new repository:
feast init
Modify the sample on_demand_feature_view
to return an array of floats instead of just floats, e.g.
diff --git a/true_garfish/feature_repo/example_repo.py b/true_garfish/feature_repo/example_repo.py
index 1f5b946..59d4501 100644
--- a/true_garfish/feature_repo/example_repo.py
+++ b/true_garfish/feature_repo/example_repo.py
@@ -16,7 +16,7 @@ from feast import (
from feast.feature_logging import LoggingConfig
from feast.infra.offline_stores.file_source import FileLoggingDestination
from feast.on_demand_feature_view import on_demand_feature_view
-from feast.types import Float32, Float64, Int64
+from feast.types import Float32, Float64, Int64, Array
# Define an entity for the driver. You can think of an entity as a primary key used to
# fetch features.
@@ -72,15 +72,16 @@ input_request = RequestSource(
@on_demand_feature_view(
sources=[driver_stats_fv, input_request],
schema=[
- Field(name="conv_rate_plus_val1", dtype=Float64),
- Field(name="conv_rate_plus_val2", dtype=Float64),
+ Field(name="conv_rate_plus_vals", dtype=Array(Float64)),
],
)
def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame:
- df = pd.DataFrame()
- df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
- df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
- return df
+ result = {"conv_rate_plus_vals": []}
+ for _, row in inputs.iterrows():
+ result["conv_rate_plus_vals"].append(
+ [row["conv_rate"] + row["val_to_add"], row["conv_rate"] + row["val_to_add_2"]]
+ )
+ return pd.DataFrame(data=result)
- Run
feast apply
, and you should get the following error:
Traceback (most recent call last):
File "~/.../.venv/bin/feast", line 8, in <module>
sys.exit(cli())
^^^^^
File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.../.venv/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/.../.venv/lib/python3.12/site-packages/feast/cli.py", line 506, in apply_total_command
apply_total(repo_config, repo, skip_source_validation)
File "~/.../.venv/lib/python3.12/site-packages/feast/repo_operations.py", line 347, in apply_total
apply_total_with_repo_instance(
File "~/.../.venv/lib/python3.12/site-packages/feast/repo_operations.py", line 299, in apply_total_with_repo_instance
registry_diff, infra_diff, new_infra = store.plan(repo)
^^^^^^^^^^^^^^^^
File "~/.../.venv/lib/python3.12/site-packages/feast/feature_store.py", line 745, in plan
self._make_inferences(
File "~/.../.venv/lib/python3.12/site-packages/feast/feature_store.py", line 640, in _make_inferences
odfv.infer_features()
File "~/.../.venv/lib/python3.12/site-packages/feast/on_demand_feature_view.py", line 521, in infer_features
inferred_features = self.feature_transformation.infer_features(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/....venv/lib/python3.12/site-packages/feast/transformation/pandas_transformation.py", line 47, in infer_features
python_type_to_feast_value_type(f, type_name=str(dt))
File "~/.../.venv/lib/python3.12/site-packages/feast/type_map.py", line 215, in python_type_to_feast_value_type
raise ValueError(
ValueError: Value with native type object cannot be converted into Feast value type
Adding some debug statements inside python_type_to_feast_value_type
, we get the following locals before the error was raised:
name='conv_rate_plus_vals'
value=None
recurse=True
type_name='object'
type(value)=<class 'NoneType'>
As mentioned before this is because all transformation backends don't pass values to the type mapper, e.g. the pandas backend in this case
Specifications
- Version: 0.39.0
- Platform: arm64
- Subsystem: MacOS
Possible Solution
- Pass the sample values generated for type inference through to the type mapper
- Update the type mapper to handle lists that are two levels deep. This is because primitive UDF outputs are wrapped in either a
np.array
orlist
of length 1, so therefore lists should be two levels deep with the inner list being the list of feature values.