Offline Store - BigQuery - Exceeded Resources on Large Feature Retrievals
Expected Behavior
When BigQuery as an offline store and pulling a significant number of features using get_historical_features, Feast should return a full dataframe regardless of the query complexity and BigQuery resource constraints during query planning.
Current Behavior
When executing a get_historical_features against a large number of features across multiple feature views, the following error is thrown by Feast.
Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.
To resolve this, we are currently executing the method handler multiple times and joining results. It would be ideal if Feast handled this as part of the abstracted interface.
Steps to reproduce
Total features used in query: 87 Total feature views used in query: 83
features list size: 162
Output from result:
Info
Error Generated due to large Batch Size: 162
Traceback (most recent call last):
Info
2022-07-05T15:54:07.577823566Z File "/tmp/tmp.rEg9YG7Uh6/ephemeral_component.py", line 341, in collect_historical
Info
2022-07-05T15:54:07.577837657Z data = hc(start_time, end_time)
Info
2022-07-05T15:54:07.577844873Z File "/tmp/tmp.rEg9YG7Uh6/ephemeral_component.py", line 197, in __call__
Info
2022-07-05T15:54:07.577851019Z full_feature_names=True
Info
2022-07-05T15:54:07.577857542Z File "/usr/local/lib/python3.7/site-packages/feast/infra/offline_stores/offline_store.py", line 43, in to_df
Info
2022-07-05T15:54:07.577863830Z features_df = self._to_df_internal()
Info
2022-07-05T15:54:07.577870525Z File "/usr/local/lib/python3.7/site-packages/feast/infra/offline_stores/bigquery.py", line 245, in _to_df_internal
Info
2022-07-05T15:54:07.577876642Z df = self._execute_query(query).to_dataframe(create_bqstorage_client=True)
Info
2022-07-05T15:54:07.577882545Z File "/usr/local/lib/python3.7/site-packages/feast/usage.py", line 280, in wrapper
Specifications
- Version: 0.17.0
- Platform: GCP
- Subsystem: ?
Possible Solution
Handle error for user, and split feature list into multiple queries to execute then join into single DF returning the result. May result in further rate limiting errors down the road depending on exceptional feature list lengths (many queries) so maybe not using CTE subqueries and instead using some other v