xgi icon indicating copy to clipboard operation
xgi copied to clipboard

Compress edges

Open nwlandry opened this issue 2 years ago • 6 comments

Implement way of collapsing identical elements. Right now, IDView.duplicates() simply chooses the lowest ID as the non-duplicate edge, but there are several ways of

  • weighting the collapsed nodes/edges:
    1. Leave the resulting single node/edge unweighted
    2. Weighting the resulting single node/edge by its original multiplicity
  • labeling the node/edge:
    1. Give the new node/edge the smallest id of the duplicates
    2. Give the new node/edge a tuple ID where the tuple are the nodes/edges that were compressed.
    3. Give the new node/edge a completely new ID This would affect the duplicates method and potentially require another method.

Something along these lines is the collapse_identical_elements in HyperNetX.

nwlandry avatar Sep 01 '22 15:09 nwlandry

What do we think the syntax should be for calling this? Originally, I was thinking H.edges.collapse(weighted=True, relabel="tuple") and same for nodes, but unfortunately, this would involve modifying the hypergraph from the IDView, which seems to contradict the notion of what a view is. What about H.collapse_edges() and H.collapse_nodes()?

nwlandry avatar Sep 07 '22 20:09 nwlandry

Agreed this method does not belong in the EdgeView. It belongs in Hypergraph. The main objective of Hypergraph is after all to modify the network's structure.

What about H.collapse_edges(weighted=weight_mode, index=index_mode, return_collapse_dict). Here "weight_mode" could be None (no weight) or str (used as the name of the attribute that stores the weight), and "index_mode" could be one of "tuple", "min", or "auto". Auto means assign a completely new ID automatically.

I'm using "index" instead of "label" because elsewhere in the codebase we refer to nodes and edges using the word "ID", not the word "label". What do you think?

leotrs avatar Sep 08 '22 06:09 leotrs

Would "merge" rather "collapse" make sense? Collapse sounds a bit dramatic 😄

maximelucas avatar Sep 21 '22 10:09 maximelucas

As per #177, the simplicial complex class will (might?) have a method that "collapses" things so it might be wise to choose a different name for this one. I agree merge sounds nice.

leotrs avatar Sep 27 '22 06:09 leotrs

As per #196, we should keep in mind what happens to the incidence matrix before/after compressing multiple edges.

leotrs avatar Oct 27 '22 12:10 leotrs

Some things I'm wondering:

  • What should we do with overlapping attributes? Make them a list?
  • Similarly, each edge could, in principle, have a weight. Should these be used in computing the merged edge weight? If that's the case, should we store it in a "weight" attribute?
  • If there isn't a "weight" attribute for non-multiedges, should we add it so that every edge has this attribute?

nwlandry avatar Oct 28 '22 18:10 nwlandry

  • I feel like it might break some things to have some edges "colors" being "red" and other being ["red", "blue"], would it? For example if you decide to use xgi.draw() and want to color edges by their "color" attribute? If there is a way for things no to break, we could have to "attribute merging modes": union and intersection. Union would keep all values of the attributes for the merging edges as a list (set?), intersection would only keep one if it is the value of all merging edges, otherwise None. That could work for numerical attributes too, or we could have "sum", "mean", .., as modes for those, though it might get complicated.

  • Is anything in our codebase using the "weight" attribute explicitly? If not, the actual weight could be stored in the "weight" as a list/sum/mean (see above) and we could generate a new attribute "multiplicity" that would be a integer and could be used in draw() just as "weight" can be.

  • I feel this will be different for numerical and categorical again. If we have a "multiplicity", then I guess we should add it to all other edges (but it might make less sense for a "weight" in general?)

maximelucas avatar Oct 30 '22 16:10 maximelucas

  • In the case when the user wants to keep all attributes inside a list: "red" becomes ["red"] while two duplicate edges with "red" and "blue" become ["red", "blue"].
  • We do have Hypergraph.add_weighted_edges_from.
  • That's a good point RE categorical vs numerical.

My proposal is that we should have a common sense default, for example, two duplicate edges with IDs 1 and 2 become a single edge with ID 1 with the attributes of 1 taking precedence over the attributes of 2. But other than that, we could have a keyword parameter that accepts a lambda to give the user complete freedom of what happens:

# by default, attributes of the lowest ID take precedence
H.collapse_edges()
# -> `{1: {"color: "red"}, 2: {"color": "blue"}}` become `{1: {"color": "red"}}`

H.collapse_edges(attributes="collect")
# -> `{1: {"color": "red"}}` becomes `{1: {"color": ["red"]}}` and `{1: {"color: "red"}, 2: {"color": "blue"}}` become `{1: {"color": ["red", "blue"]}}`

H.collapse_edges(attributes=lambda x,y: x+y)
# ->`{1: {"age": 10}, 2: {"age": 20}}` become `{1: {"age": 30}}`

# If there are multiple attributes, we can accept a dict
H.collapse_edges(attributes={"age": lambda x,y: x+y, "color": "collect"})
# ...

Note that both the default behavior and attributes="collect" work for both categorical and numerical attributes, while in the latter case, the one with the lambda, the user would be responsible for passing a function that behaves well on the given attribute.

Note this discussion is very similar to the one in #203 about what to do with attributes when adding two hypergraphs.

leotrs avatar Oct 30 '22 17:10 leotrs

Is there anything that #210 did not address or should we close?

maximelucas avatar Dec 05 '22 11:12 maximelucas

Good call. Closing this.

nwlandry avatar Dec 05 '22 15:12 nwlandry