featuretools icon indicating copy to clipboard operation
featuretools copied to clipboard

DFS does not build some features when there are multiple paths to an entity

Open CJStadler opened this issue 5 years ago • 1 comments

DFS currently builds features for each entity only once. This is problematic because depending on which path is taken to an entity different features may be built. For example, the following test currently fails:

def test_makes_direct_of_agg_on_all_paths(diamond_es):
    dfs_obj = DeepFeatureSynthesis(target_entity_id='transactions',
                                   entityset=diamond_es,
                                   max_depth=3,
                                   agg_primitives=[Count],
                                   trans_primitives=[])

    features = dfs_obj.build_features()
    # These two pass
    assert feature_with_name(features, 'stores.regions.COUNT(stores)')
    assert feature_with_name(features, 'stores.regions.COUNT(customers)')
    # These two fail
    assert feature_with_name(features, 'customers.regions.COUNT(stores)')
    assert feature_with_name(features, 'customers.regions.COUNT(customers)')

This is because the customers features are built before the aggregations on regions. The execution looks something like this

_run_dfs(transactions)
    _run_dfs(stores)
        build_agg_features
        _run_dfs(regions)
            _run_dfs(customers)
                build_agg_features
                build_direct_features
            build_agg_features
            build_direct_features
    build_direct_features

When there are not multiple paths this is fine because you don't want features like regions.MEAN(customers.regions.COUNT(stores)). But when there are multiple paths the features on customers may be used by entities other than regions.

CJStadler avatar Jul 09 '19 14:07 CJStadler

I also encouter this issue recently. To be more specific, if the entity customer has been reached in the first path, and other entities relied on the entity customer can not be reached due to max_depth, other entities can not also be reached in other paths, because the customer entity already in the variable all_features of the function build_features, and other entities can not have their own function _run_dfs even if the depth is very shallow in other paths.

Is there any solution for this issue now? @kmax12

c787297017 avatar Sep 05 '21 04:09 c787297017