feast
feast copied to clipboard
BigQueryRetrievalJob does not remove tables used for data export
Expected Behavior
Temporary tables in BigQueryRetrievalJob should be removed after the job completes or fails.
Current Behavior
When running materialization with a batch engine with BigQuery, the historical_datestamp_hash
table is created to export data from the BQ temporary table. Then, the data is extracted to GCS Bucket, but the table is always retained.
Steps to reproduce
Run the materialization job with BQ as the offline_store
and use batch_engine
i.e., bytewax
.
Specifications
- Version: 0.34.1
Possible Solution
Add cleanup in BigQueryRetrievalJob.to_remote_storage
.
I think there's a practice you can apply on your side first is to set up default table expiration for your BigQuery dataset. And do you mind creating a PR for this?
PR for which part? Modify the historical_datestamp_hash
table properties to make it expire? Or implement try / finally
in BigQueryRetrievalJob.to_remote_storage
once the export job is completed?
If modify table properties - that causes some complications:
The same BigQueryRetrievalJob.to_bigquery
function, where the table is created, is used to create a saved dataset via FeatureStore.create_saved_dataset
.
Probably the "try/finally" option is better for the current situation, with the risk that "to_remote_storage" can crash after the table is created.
try/finally sounds great to me as well
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.