featuretools
featuretools copied to clipboard
DFS does not build some features when there are multiple paths to an entity
DFS currently builds features for each entity only once. This is problematic because depending on which path is taken to an entity different features may be built. For example, the following test currently fails:
def test_makes_direct_of_agg_on_all_paths(diamond_es):
dfs_obj = DeepFeatureSynthesis(target_entity_id='transactions',
entityset=diamond_es,
max_depth=3,
agg_primitives=[Count],
trans_primitives=[])
features = dfs_obj.build_features()
# These two pass
assert feature_with_name(features, 'stores.regions.COUNT(stores)')
assert feature_with_name(features, 'stores.regions.COUNT(customers)')
# These two fail
assert feature_with_name(features, 'customers.regions.COUNT(stores)')
assert feature_with_name(features, 'customers.regions.COUNT(customers)')
This is because the customers
features are built before the aggregations on regions
. The execution looks something like this
_run_dfs(transactions)
_run_dfs(stores)
build_agg_features
_run_dfs(regions)
_run_dfs(customers)
build_agg_features
build_direct_features
build_agg_features
build_direct_features
build_direct_features
When there are not multiple paths this is fine because you don't want features like regions.MEAN(customers.regions.COUNT(stores))
. But when there are multiple paths the features on customers
may be used by entities other than regions
.
I also encouter this issue recently. To be more specific, if the entity customer has been reached in the first path, and other entities relied on the entity customer can not be reached due to max_depth, other entities can not also be reached in other paths, because the customer entity already in the variable all_features of the function build_features, and other entities can not have their own function _run_dfs even if the depth is very shallow in other paths.
Is there any solution for this issue now? @kmax12