featuretools
featuretools copied to clipboard
Support entitysets with cycles in their relationship graph
For example, a self-loop:
Entities:
employees
Relationships:
employees.manager_id -> employees.id
Or, a cycle involving multiple entities:
Entities:
users
roles
Relationships
users.role_id -> roles.id
roles.creator_id -> users.id
We should be able to create features that traverse these cycles once or more. For example, "The average salary of the direct reports of an employee's direct reports": MEAN(employees.employees.salary)
.
To support this there are at least two places in the code which assume there are no cycles and so will need to be updated:
-
EntitySet.has_unique_forward_path
: When searching for paths this ignores entities which have already been seen – only traversing cycles once. In the case of the "employees" entityset above it would say that there is a unique path fromemployees
toemployees
, even though there are infinite. -
DeepFeatureSynthesis.build_features
: Currently this will get stuck in an infinite loop when run on an entityset with a cycle (in the call toEntitySet.get_backward_entities(eid, deep=True)
). To fix this we could add amax_relationship_depth
param to limit the number of relationships which will be traversed. This would change existing behavior even in entitysets without cycles because there is currently no such limit (max_depth
only limits the nesting of features, not the lengths of their relationship paths).