xgi
xgi copied to clipboard
Compress edges
Implement way of collapsing identical elements. Right now, IDView.duplicates()
simply chooses the lowest ID as the non-duplicate edge, but there are several ways of
- weighting the collapsed nodes/edges:
- Leave the resulting single node/edge unweighted
- Weighting the resulting single node/edge by its original multiplicity
- labeling the node/edge:
- Give the new node/edge the smallest id of the duplicates
- Give the new node/edge a tuple ID where the tuple are the nodes/edges that were compressed.
- Give the new node/edge a completely new ID This would affect the duplicates method and potentially require another method.
Something along these lines is the collapse_identical_elements
in HyperNetX.
What do we think the syntax should be for calling this? Originally, I was thinking H.edges.collapse(weighted=True, relabel="tuple")
and same for nodes, but unfortunately, this would involve modifying the hypergraph from the IDView, which seems to contradict the notion of what a view is. What about H.collapse_edges()
and H.collapse_nodes()
?
Agreed this method does not belong in the EdgeView
. It belongs in Hypergraph
. The main objective of Hypergraph
is after all to modify the network's structure.
What about H.collapse_edges(weighted=weight_mode, index=index_mode, return_collapse_dict)
. Here "weight_mode" could be None (no weight) or str (used as the name of the attribute that stores the weight), and "index_mode" could be one of "tuple", "min", or "auto". Auto means assign a completely new ID automatically.
I'm using "index" instead of "label" because elsewhere in the codebase we refer to nodes and edges using the word "ID", not the word "label". What do you think?
Would "merge" rather "collapse" make sense? Collapse sounds a bit dramatic 😄
As per #177, the simplicial complex class will (might?) have a method that "collapses" things so it might be wise to choose a different name for this one. I agree merge sounds nice.
As per #196, we should keep in mind what happens to the incidence matrix before/after compressing multiple edges.
Some things I'm wondering:
- What should we do with overlapping attributes? Make them a list?
- Similarly, each edge could, in principle, have a weight. Should these be used in computing the merged edge weight? If that's the case, should we store it in a "weight" attribute?
- If there isn't a "weight" attribute for non-multiedges, should we add it so that every edge has this attribute?
-
I feel like it might break some things to have some edges "colors" being "red" and other being ["red", "blue"], would it? For example if you decide to use
xgi.draw()
and want to color edges by their "color" attribute? If there is a way for things no to break, we could have to "attribute merging modes": union and intersection. Union would keep all values of the attributes for the merging edges as a list (set?), intersection would only keep one if it is the value of all merging edges, otherwise None. That could work for numerical attributes too, or we could have "sum", "mean", .., as modes for those, though it might get complicated. -
Is anything in our codebase using the "weight" attribute explicitly? If not, the actual weight could be stored in the "weight" as a list/sum/mean (see above) and we could generate a new attribute "multiplicity" that would be a integer and could be used in
draw()
just as "weight" can be. -
I feel this will be different for numerical and categorical again. If we have a "multiplicity", then I guess we should add it to all other edges (but it might make less sense for a "weight" in general?)
- In the case when the user wants to keep all attributes inside a list:
"red"
becomes["red"]
while two duplicate edges with"red"
and"blue"
become["red", "blue"]
. - We do have
Hypergraph.add_weighted_edges_from
. - That's a good point RE categorical vs numerical.
My proposal is that we should have a common sense default, for example, two duplicate edges with IDs 1 and 2 become a single edge with ID 1 with the attributes of 1 taking precedence over the attributes of 2. But other than that, we could have a keyword parameter that accepts a lambda to give the user complete freedom of what happens:
# by default, attributes of the lowest ID take precedence
H.collapse_edges()
# -> `{1: {"color: "red"}, 2: {"color": "blue"}}` become `{1: {"color": "red"}}`
H.collapse_edges(attributes="collect")
# -> `{1: {"color": "red"}}` becomes `{1: {"color": ["red"]}}` and `{1: {"color: "red"}, 2: {"color": "blue"}}` become `{1: {"color": ["red", "blue"]}}`
H.collapse_edges(attributes=lambda x,y: x+y)
# ->`{1: {"age": 10}, 2: {"age": 20}}` become `{1: {"age": 30}}`
# If there are multiple attributes, we can accept a dict
H.collapse_edges(attributes={"age": lambda x,y: x+y, "color": "collect"})
# ...
Note that both the default behavior and attributes="collect"
work for both categorical and numerical attributes, while in the latter case, the one with the lambda, the user would be responsible for passing a function that behaves well on the given attribute.
Note this discussion is very similar to the one in #203 about what to do with attributes when adding two hypergraphs.
Is there anything that #210 did not address or should we close?
Good call. Closing this.