how to add a dataframe that rows are valid for a period of time with featuretools
I am working on a dataset with multiple tables. I am using featuretools library for feature engineering. One of the tables that is NOT the target dataframe, comes with several columns. Three of three column are related to the conversation: ['rating', 'valid_from', 'valid_to']. I use valid_from as the time_index but am not sure how to incorporate valid_to column. If this was the target dataframe I could have used valid_to as cutoffs but since it's not the target dataframe I don't know how to set up the problem so there is no data leakage.
I also thought of using valid_to as the time_index but again I am not sure how to incorporate valid_from column in that case.
import featuretools as ft
Assuming es is your existing entity set
Add the secondary table with 'valid_from' as the time_index
es = ft.EntitySet(id="your_entity_set")
es = es.entity_from_dataframe(
entity_id="secondary_table",
dataframe=secondary_df, # your secondary dataframe
index="secondary_id", # primary key of the secondary table
time_index="valid_from" # use valid_from as the time_index
)
Make sure to filter by 'valid_to' in any relationship between this table and the target table
relationship = ft.Relationship( es["target_table"]["target_id"], # Foreign key in target table es["secondary_table"]["secondary_id"], # Primary key in secondary table ) es = es.add_relationship(relationship)
Filter secondary table to avoid using records where valid_to < cutoff
During feature engineering, this will automatically apply the filter to prevent leakage
def filter_valid_rows(df, cutoff_time): return df[(df['valid_to'] >= cutoff_time)]
es["secondary_table"] = es["secondary_table"].df.groupby('secondary_id').apply(filter_valid_rows)
Use the filtered data in DFS
feature_matrix, feature_defs = ft.dfs( entityset=es, target_entity="target_table", cutoff_time=cutoff_times_df, # DataFrame containing cutoffs for each instance features_only=False )
This should help u , if u have any questions u can reach out to me