featuretools icon indicating copy to clipboard operation
featuretools copied to clipboard

Support entitysets with cycles in their relationship graph

Open CJStadler opened this issue 5 years ago • 0 comments

For example, a self-loop:

Entities:
  employees
Relationships:
  employees.manager_id -> employees.id

Or, a cycle involving multiple entities:

Entities:
  users
  roles
Relationships
  users.role_id -> roles.id
  roles.creator_id -> users.id

We should be able to create features that traverse these cycles once or more. For example, "The average salary of the direct reports of an employee's direct reports": MEAN(employees.employees.salary).

To support this there are at least two places in the code which assume there are no cycles and so will need to be updated:

  1. EntitySet.has_unique_forward_path: When searching for paths this ignores entities which have already been seen – only traversing cycles once. In the case of the "employees" entityset above it would say that there is a unique path from employees to employees, even though there are infinite.
  2. DeepFeatureSynthesis.build_features: Currently this will get stuck in an infinite loop when run on an entityset with a cycle (in the call to EntitySet.get_backward_entities(eid, deep=True)). To fix this we could add a max_relationship_depth param to limit the number of relationships which will be traversed. This would change existing behavior even in entitysets without cycles because there is currently no such limit (max_depth only limits the nesting of features, not the lengths of their relationship paths).

CJStadler avatar Jun 18 '19 13:06 CJStadler