flow_stability icon indicating copy to clipboard operation
flow_stability copied to clipboard

extra_attrs when setting the event table

Open YasAsgari opened this issue 5 months ago • 6 comments

Hi. When following the instructions for set_temporal_network, I was checking the extra_attrs documentation.

It is written as below:



extra_attrs: Dict
        Extra event attributes. Must be given in a dict with {attr_name: list_of_values},
        where list_of_values has the same order and length as `source_nodes`.

My problem is that it is not clear whether I should give the attr_name mapping unique nodes or just repeat the source_nodes forever.

Suggested fix:

Better explanation on how to add multiple attributes, or at least explain that it is unique nodes ( like sex in mice dataset).

YasAsgari avatar Jun 16 '25 10:06 YasAsgari

👋 @YasAsgari I see.

The description might be somewhat misleading. The extra_attrs is an attribute to

https://github.com/alexbovet/flow_stability/blob/434caf46c06f57d41ea983ea3708f7c5d1508894/src/flowstab/temporal_network.py#L53 ... https://github.com/alexbovet/flow_stability/blob/434caf46c06f57d41ea983ea3708f7c5d1508894/src/flowstab/temporal_network.py#L70-L72

A minimal example would look like this:

from flowstab.temporal_network import ContTempNetwork

extra_attrs = {
    'attr1': [True, False, False, True],
    'attr2': [10, 20, 30, 40]
}
temp_nw = ContTempNetwork(
    source_nodes=[1, 2, 1, 1],
    target_nodes=[1, 2, 3, 2],
    starting_times=[0, 1, 2, 4],
    ending_times=[2, 3, 3, 4],
    extra_attrs=extra_attrs
)
columns = temp_nw.events_table.columns.tolist()
print(temp_nw.events_table)

#         source_nodes  target_nodes  starting_times  ending_times  attr1  attr2  durations
#   0             0             0               0             2   True     10          2
#   1             1             1               1             3  False     20          2
#   2             0             2               2             3  False     30          1
#   3             0             1               4             4   True     40          0

I think extra_attrs is meant to provide additional columns in the events table, but nothing specific to the source_nodes per se, each extra column just needs to have the same amount of entries (obviously in the same order).

However, extra_attrs might not be an ideal location to store constant node information, like the sex in case of the mice dataset. This simply because extra_attrs requires one entry per event (not per node) and thus the information needs to be repeated in every event the node occurs, leading to unnecessary inflation of the events_table.

We might adapt the description to state that extra_attrs allows to provide additional columns to the events_table, such as the location of an event, or other event specific information. We can also provide the minimal toy example from above, which might also serve as minimal example of how to use set_temporal_network (and actually ContTempNetwork).

Would that be helpful?


One other thing to note: extra_attrs is ignored if a ContTempNetwork is initiated with an events_table directly.

Personally, I'm not sure what added value the extra_attrs argument provides to the package. To my understanding, the package currently provides no functionality that would allow to use the extra columns in some particular way (other than one just can use columns in a pandas DataFrame).

j-i-l avatar Jun 16 '25 21:06 j-i-l

@j-i-l Hi.

In my understanding of network science, I believe adding one column as extra_attrs might be confusing with the edge attribute. When I use extra_attrs in my projects, I use it as an extra layer to see if some communities have a different composition from the other.

So basically, it should be a dictionary that maps both source nodes and target nodes. Otherwise, there is unique information, except when the features are changing throughout time.

So two possibilities can be useful: 1, Static features: only a mapping from unique nodes to a value 2, dynamic features: two extra columns, one for source and one for target.

Let me know what you think.

YasAsgari avatar Jun 17 '25 07:06 YasAsgari

extra_attrs is indeed for edge attributes. It is indeed a bit confusing because source_nodes has the length of the number of events, not nodes. Let's rename this to edge_attrs and update the doc string. We need to keep it, it will be especially useful for @YasAsgari 's project. (but we should think about separating temporal_network.py from flow stab.

alexbovet avatar Jun 17 '25 08:06 alexbovet

We should first decide for what elements we need to be able to store data:

  • temporal network
  • single event
  • single edge (i.e. node-pair)
  • single node

Once decided we can evaluate how/when this data is best added.

Having an initialization of a temporal network that allows to set data for all of these elements is doable, but won't look pretty and can lead to quite some confusion (see e.g. the minimal example above, the source/target nodes were reset to base 0, so that might already lead to some confusion how to provide node specific data).

Currently, temporal networks are created from events, so we might just focus on adding event specific data, e.g. event_attrs, during the initiation. For the other things we can always provide add_node_attributes (or similar) methods.

A cleaner way might be to delegate the extra event attributes to the optional keyword arguments, the **kw..:

https://github.com/alexbovet/flow_stability/blob/434caf46c06f57d41ea983ea3708f7c5d1508894/src/flowstab/temporal_network.py#L118

Currently, they are just used to forward optional arguments to the pd.read_csv function:

https://github.com/alexbovet/flow_stability/blob/434caf46c06f57d41ea983ea3708f7c5d1508894/src/flowstab/temporal_network.py#L163

We could write:

"""
...
**kwargs: Optional columns to add to the events table

  >>> temp_nw = ContTempNetwork(
  >>>     source_nodes=[1, 2, 1, 1],
  >>>     target_nodes=[1, 2, 3, 2],
  >>>     starting_times=[0, 1, 2, 4],
  >>>     ending_times=[2, 3, 3, 4],
  >>>     attr1=[True, False, True, False],
  >>>     my_extra_event_property=[10,20,30,40]
  >>>     )
...
"""

j-i-l avatar Jun 17 '25 11:06 j-i-l

let's have potential attributes for

  • temporal network
  • single event
  • single node

We could have

  • net_attrs
  • event_attrs
  • node_attrs

As possible inputs when initializing.

I don't understand why this is not pretty and why having them in the **kwargs would be better. @j-i-l , could you explain?

In any case, this is not urgent and we could so that after separating temporal_networks from flow_stab.

alexbovet avatar Jun 18 '25 07:06 alexbovet

We can easily enable the 3.

When initiating an object, like an instance of a temporal network, **kwargs are the foreseen default for optional keyword arguments. This is a python thing but also used by NetworkX, for example. So in our case, when we initiate a ContTempNetwork, they might be the place to add attributes to the temporal network. Node specific attributes could be added with the node_attr.

The reason why I consider this a bit messy is: we currently do not control the node ID's. What I mean by that is that node IDs get set back to base 0, if they are not (see example code above) and, if an events_table is provided then some functionality even breaks if the node ID's in the table are not continuous, starting from 0.

In this situation adding node_attrs to the init of a ContTempNetwork comes with the challenge of correctly mapping this info to the specific nodes. If the node IDs are reset during the initiation, then we need to apply the same mapping to the node_attrs. In our side this is certainly doable, that is not the problem, but on the user side it might not be so clear that the provided node IDs are changed and that retrieving and setting node attributes requires the reset IDs. In short, we should first handle node IDs in a solid manner, then there is no problem.

Does this somehow clarify what i meant?

j-i-l avatar Jun 19 '25 15:06 j-i-l