featuretools
featuretools copied to clipboard
EntitySet.update_dataframe can add last time indexes that weren't there before
When recalculate_last_time_indexes=True
, it’s possible to end up with not just recalculated last time indexes where they already exist, but with last time indexes calculated on more dataframes than before.
For example, if you do es.add_last_time_indexes(['products'])
you only get last time indexes for products and log because products has no parents and its only child is log.
But if you take that dataframe, update log’s dataframe with es.update_dataframe('log', new_dataframe, recalculate_last_time_indexes=True)
, it will perform es.add_last_time_indexes(['log'])
which will also calculate last time indexes for all of log’s parents, resulting in many more dataframes with last time indexes than before.
There are two questions:
- Is this behavior expected?
- If it’s not, is the correct behavior to only have last time indexes calculated for dataframes that already have it? Or, is it that add_last_time_indexes should be adding log’s parents to the initial calculation in the first place?
We should make sure to consider an edge case where a child that has a time index has a parent who does not have a time index but who has a grandparent with a time index.