pytorch_geometric icon indicating copy to clipboard operation
pytorch_geometric copied to clipboard

[Roadmap] Temporal Graph Support 🚀

Open rusty1s opened this issue 2 years ago • 61 comments

Updated on Nov 27th.

This is a roadmap towards better temporal data and temporal GNN support in PyG (targeted for v2.5), which spans across data handling, data loading, models and examples.

Data Handling

Previously, we used TemporalData as our abstraction to represent temporal graphs. However, this data structure is limited, i.e. it can only hold a stream of events, it mixes homogeneous and heterogeneous data, it cannot naturally deal with node features, etc. As such, we want to deprecate TemporalData and move time handling to Data and HeteroData explicitly.

  • [ ] Deprecate TemporalData
  • [x] Bring data.time support to Data/HeteroData:
    • [x] Expose time as a property
    • [x] Support data.up_to(time) -> Data and data.snapshot(start_time, end_time)
    • [x] Support data.sort_by_time (or similar)
  • [ ] Convert all datasets that currently utilize TemporalData
  • [ ] Support node-level training labels: This requires a data-structure for which we can query the label efficiently given a node ID and seed timestamp.
  • [x] Support Data.update(data)/HeteroData.update(data) in order to add new incoming edges.

Open Questions:

  • How do we represent changing node features over time?

Data Loading

With Data and HeteroData having time support, we can utilize PyG's data loaders to take care of temporal subgraph sampling. Importantly, NeighborLoader and LinkNeighborLoader have been already equipped with temporal sampling support.

  • [ ] Deprecate TemporalDataLoader
  • [x] Support node-level/edge-level temporal sampling in NeighborLoader
  • [x] Support node-level/edge-level temporal sampling in LinkNeighborLoader

Models

Currently, we only expose TGN as a temporal GNN model, which currently operates on TemporalData and its own sampler implementation.

  • [ ] Re-factor TGN to be able to operate on Data/HeteroData while utilizing NeighborLoader/LinkNeighborLoader
  • [ ] Re-factor TGN to support multi-hop sampling (currently capped to num_hops=1)
  • [ ] Finish our GraphMixer implementation (#8304)
    • [ ] Out-source MLPMixer in GraphMixer into an Aggregation module
  • [ ] Finish our EdgeBank implementation (#7966)
  • [ ] Add the NAT model

Examples and Tutorials

Afterwards, we need clear and descriptive examples and tutorials on how to leverage PyG for node-level and link-level temporal applications.

  • [ ] An example of GraphMixer for temporal node-level predictions on TGB dataset
  • [ ] An example of EdgeBank for temporal link-level predictions on TGB dataset
  • [ ] A tutorial on PyG's temporal GNN support

Advanced Features

One current limitation of most temporal GNN models is that they learn an embedding per node, which gets updated over time. However, it is not necessarily scalable to keep this embedding on the GPU due to memory constraints. As such, advanced features may include data-structures to query and train these embeddings on CPU, with efficient asynchronous device transfers to avoid host2device/device2host bottlenecks. An example of such a data-structure may be inspired by the GNNAutoScale paper.

rusty1s avatar Sep 24 '21 08:09 rusty1s

This would be very useful for our work. Do you have an idea in mind of how this would look?

Padarn avatar Nov 06 '21 13:11 Padarn

I sadly hadn't have time to fill in some concrete roadmap plans yet, but we are looking into ways to:

  1. Allow easy updates to edge_index for newly incoming edges
  2. Allow for temporal sampling (either as part of current samplers or as its own sampler).

rusty1s avatar Nov 08 '21 08:11 rusty1s

Ah got it, so the graph itself is temporal, you're not thinking of time series on a graph. Keen to learn more if and when you guys have further thoughts on this. Thanks!

Padarn avatar Nov 13 '21 07:11 Padarn

Hi @rusty1s, in order to try to contribute to the Roadmap definition/planning, I want to add some topics/thoughts to be considered:

  • Consider extending or updating the TemporalData entity to support all the combinations of static/dynamic graphs with static/temporal signals. That way the Signal implementations in PyGT may inherit from it in future.
  • There are Discrete-time dynamic graphs - DTDG (sequences of static graph snapshots taken at intervals in time) and Continuous-time dynamic graphs - CTDG (timed lists of events). There are models that use DTDG and models that use CTDG. The differences between them in terms of implementation might be considered during the design of the structures for Dynamic Graphs.
  • Currently, the TGNMemory module implements a way to deal with updates on the graph (item 1. in the list above). A more generic module (extracted from there) might be considered.

If you agree with any of these topics/thoughts, we can transform them into tasks to be done and I can help with some of them :)

otaviocx avatar Dec 07 '21 01:12 otaviocx

Thank you @otaviocx, these are great thoughts. I'm still struggling to find a good way to unify DTDG and CTDG. Currently, TemporalData refers to a list of events (which we could convert to DTDG but not vice versa). How do you think people want to define their temporal data in the first place? Shall we require lists of events as input and then simply allow conversion to snapshots? Do you think it's best to separate both representations from the start or do you think we should unify them?

rusty1s avatar Dec 07 '21 13:12 rusty1s

Hi @rusty1s, thanks for answering!

How do you think people want to define their temporal data in the first place?

It is hard to say since this area is really recent and still evolving a lot. But, I would say we can start with something as flexible as possible to support many scenarios. That being said, I can think of 2 ways of temporal data representation:

  • a list of events (which can already be represented by TemporalData)
  • a graph structure with some timestamp fields representing the temporal layer. In this approach, we can extend/update the utils.to_networkx and utils.from_networkx methods to start supporting temporal info. It can be, for instance, a new param on each function to pass the name of the timestamp field for nodes and edges. Wdyt?
    • we can evolve this approach to deal with snapshots too. For instance, to have another method, let's say utils.increment_from_networkx, which will get a graph snapshot and a list of events starting from this first snapshot (just an idea that came to my mind now).

Shall we require lists of events as input and then simply allow conversion to snapshots?

Thinking on the second approach in the answer above, we can also support graph snapshots in a well-known format (such as networkx). But, I would say we can allow the conversion. Even not working with DTDG, I think the snapshots may be useful for the users/researchers as a debug tool. It can be used to show the graph at specific times and to do some manual comparisons and also plotting. Wdyt?

Do you think it's best to separate both representations from the start or do you think we should unify them?

I think we can have these 2 representations as separated structures and allow the conversion between them. Thinking about a Roadmap, as far as I can see, the DTDG are being used less and less by new models. Thus, we may consider the support of snapshots and conversions as further steps... Wdyt?

otaviocx avatar Dec 07 '21 14:12 otaviocx

These are great thoughts, thank you. I'm currently leaning towards option 2, where we let the attribute time (or t) denote the timestamp information. This way, we can convert simple data objects into a stream of events as well (and vice versa). We can also add a Data.update to append new edges to the temporal graph.

rusty1s avatar Dec 08 '21 15:12 rusty1s

Hi, @rusty1s. Nice, thanks! We can also add a TemporalData.snapshot method which will receive a param time and will return a snapshot of the Data until the given time, wdyt?

Thus, I think we can already get some tasks from this conversation to put in this Roadmap. I would say:

  • Create a child of Data or BaseData that can represent a Temporal Graph
    • it can be a refactoring of TemporalData, for instance.
  • Add an update method to this new TemporalData class to allow users to append new edges to the temporal graph.
  • Add utils methods to convert from simple data objects into a stream of events (and vice versa).
    • The stream of events can be represented by the current TemporalData class, for instance.
  • Add a snapshot method to this new TemporalData class to take a snapshot of the graph given a timestamp.
  • Select at least 5 Temporal GNN models (as diverse as possible) and try to implement them using PyG to investigate what else can be needed.

Is this a good start?

One question: does it make sense to have TemporalData inheriting from BaseData?

otaviocx avatar Dec 09 '21 22:12 otaviocx

Yes, this sounds great. I think the separation of Data and TemporalData makes sense in the beginning, but it's also possible to merge those two in later releases.

Yes, TemporalData should inherit from BaseData IMO. This + update and conversion to snapshot Data objects seem like a great starting point.

rusty1s avatar Dec 11 '21 13:12 rusty1s

@otaviocx Thanks for this, its a great plan. One thing to think of after we are done with the initial 5 steps you mentioned, is to think of how to incorporate node addition/ deletion to the update method.

wsad1 avatar Dec 18 '21 12:12 wsad1

Really interesting discussion.

Have we got enough to put together a set of issues so that people like me can also help out :-)

On Sat, 18 Dec 2021, 8:47 pm Jinu Sunil, @.***> wrote:

@otaviocx https://github.com/otaviocx Thanks for this, its a great plan. One thing to think of after we are done with the initial 5 steps you mentioned, is to think of how to incorporate node addition/ deletion to the update method.

— Reply to this email directly, view it on GitHub https://github.com/pyg-team/pytorch_geometric/issues/3230#issuecomment-997197859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGRPNYAXP7T7LY7QZ6KZNTURR7HJANCNFSM5EVNLVHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

--

By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled entities (“Grab Group”), you are deemed to have consented to the processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ https://grab.com/privacy/

This email contains confidential information and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email Please notify Grab Group immediately if you have received this by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do not accept liability for any errors or omissions in the contents of this email arises as a result of email transmission. All intellectual property rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise provided by law.

Padarn avatar Dec 18 '21 13:12 Padarn

Thank you! I updated the initial description of the issue according to our discussion. If you are interested in contributing, feel free to send an initial PR or create a separate issue to discuss further :)

rusty1s avatar Dec 18 '21 18:12 rusty1s

Nice! The roadmap looks great.

Padarn avatar Dec 19 '21 01:12 Padarn

Thanks for updating the roadmap @rusty1s.

I'm currently working on items 1, 2, 4 and 5. Planning to submit some PRs this week.

otaviocx avatar Dec 20 '21 16:12 otaviocx

Looking forward to it :)

rusty1s avatar Dec 21 '21 09:12 rusty1s

Hey @otaviocx I thought I might try to start helping out with the last task. Do you have anything in mind for the methods that would be best to implement?

I'm not very familiar with what is being done in the field, but for CTDG I was thinking something that looks like the framework proposed in https://arxiv.org/abs/2006.10637 is probably worth having?

Padarn avatar Dec 26 '21 05:12 Padarn

Hi @Padarn, sorry for the delay in replying to your message. For some reason, I didn't get notified. I'm not sure if I understood your question, please, excuse me if my answer goes on the wrong path.

So, the TGN is already implemented in https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/nn/models/tgn.py. Actually, this seems to be the only Temporal GNN model defined in PyG. There is a good list of other Temporal architectures here https://pytorch-geometric-temporal.readthedocs.io/en/latest/notes/resources.html

The TemporalData class currently is used by the TGN implementation. But, it is more like an Event List/Log than a temporal graph. I would say that it may work for other CTDG architectures as it is, but not sure for DTDG. Wdyt?

@rusty1s sorry for the delay in sending the PRs. I was stuck on the implementation of all the missing methods that TemporalData should have to inherit from BaseData. I'm preparing two approaches to show soon:

  • To add the missing methods in TemporalData and use a GlobalStorage as self._store similar to the one we have in the Data class.
  • To inherit from Data and have a method to receive the event log as we currently receive in the TemporalData constructor (src, dst, time, y and message) and convert it to the Data signature (x, edge_index, edge_attr, y and position).

One thing that I realised is that we will also need an HeteroTemporalData for some models, do you agree? The TemporalData class as it is doesn't support different types of nodes or edges yet.

otaviocx avatar Jan 07 '22 00:01 otaviocx

Hi @otaviocx no worries and thanks for your reply. Your reply does not go down the wrong path at all. My question was just to understand which architectures you thought would best represent what we want to make sure is supported. The reference you provided gives a good starting point.

I agree that we'll probably need some of the changes coming in your upcoming PRs to support DTDG architectures with TemporalData without introducing some hacks.

I wasn't aware of https://pytorch-geometric-temporal.readthedocs.io/en/latest/index.html earlier - Is it fair to say we want to bring most of this functionality inside of pytorch geometric?

Padarn avatar Jan 08 '22 05:01 Padarn

I wasn't aware of https://pytorch-geometric-temporal.readthedocs.io/en/latest/index.html earlier - Is it fair to say we want to bring most of this functionality inside of pytorch geometric?

Hi @Padarn, I would say that here we will have a more generic approach and elements in a way that pytorch-geometric-temporal or other derivated libs can take advantage of. But, it is better to confirm with the maintainers (@rusty1s, for instance).

sorry for the delay in sending the PRs

Hi @rusty1s, I've created the first PR with items 1 and 2 of the Roadmap. Item 3 will need some refactoring in the JODIEDataset and other classes. This dataset seems to be not supporting the DataLoader yet (we need to deal with the data param in the model since there is no train/val/test dataset split). I'm thinking to refactor this dataset to have a split param in its constructor turning possible to have 3 (train/val/test) DataLoaders in the model example file.

Also, some classes (such as the DataLoader) expects an Union of Data and HeteroData as a param (Union[List[Data], List[HeteroData]] or Union[Data, HeteroData]). I'm refactoring where is needed to BaseData.

otaviocx avatar Jan 16 '22 16:01 otaviocx

One thing to think of after we are done with the initial 5 steps you mentioned, is to think of how to incorporate node addition/ deletion to the update method.

Hey @wsad1 and @rusty1s, regarding the update method, if we follow the TGN strategy to model the temporal graph as a stream of events, we can actually have events to add/update edges or nodes or even to delete them: image Thus, I think the update in such a case will be just the addition of new events to the stream, wdyt?

The logic to compute the nodes/edges addition, update or deletion will be actually in the model (Memory in the case of TGN) or in the snapshot function to get a static version of the graph at the given time (considering all the additions, updates and deletions).

Does it make sense to you?

otaviocx avatar Feb 09 '22 02:02 otaviocx

This is exactly what I was thinking when I read that paper. +1 from me.

On that note, I am looking at working on the temporal snapshots on top of the work you're doing. Let me know if you see any risk of overlapping efforts here.

Padarn avatar Feb 09 '22 06:02 Padarn

Thus, I think the update in such a case will be just the addition of new events to the stream, wdyt?

Not sure I understand that. Node-wise and interactions events seem to have different representations, so how do you want to merge them?

rusty1s avatar Feb 09 '22 08:02 rusty1s

Node-wise and interactions events seem to have different representations, so how do you want to merge them

Well... as per the TGN and Jodie papers, the datasets we currently have are all with a single type of event (interactions). Even so, we can model a node-wise event as an event that has the same value for the src and dst keys. Just checked and if we get an event like that, only the memory and embeddings of this node are changed (on the current TGN implementation).

Now that we are diving a bit deeper into that subject, some other topics/challenges appear:

  • We may also need to change the TGNMemory module since as it is now, the number of nodes is fixed. Thus, if a new node appears, there is no memory for it on this model.
  • Most of the CTDG GNNs are using events stream with this same structure: source node id, destination node id, time and message. Thus, it seems the features of the nodes and edges features are mixed in the message. So, if my understanding is right, there is no way to extract a static graph (snapshot) with features in nodes and edges (all the features will be always in edges since they represent the messages/events). We may conclude that the support for DTDG may be based on a different structure (something more similar to what we already have in PyG Temporal, for instance).
  • Most (if not all) CTDG GNNs are dealing with multi-graphs, which means that we can have many edges linking the same two nodes. The challenge here is again on the static graph (snapshot) extraction. Should we extract a multi-graph? Or... should we extract a graph considering only the last edge for each node pair?

I am looking at working on the temporal snapshots on top of the work you're doing

@Padarn, considering the topics above, I would say we may need some design definitions before starting to code changes regarding snapshots, wdyt?

otaviocx avatar Feb 09 '22 20:02 otaviocx

@Padarn, Data.to_temporal(time_attr='time') on the other side is something that I think we can start working on, wdyt?

otaviocx avatar Feb 09 '22 20:02 otaviocx

Thanks for the clarification @otaviocx. So node-level events are treated as self-loops. So, IMO, to_snapshot should probably simply give you a snapshot in the form of a multi-graph with self-loops (solely holding edge features rather than node features). After that, the user could make self-loop features to node features (e.g., based via some DNN module). In theory, we could have different options for to_snapshot (like keep_ony_last=True), but we should start simple and the least magically.

Updates to the number of nodes/edges is another interesting topic, and I have some thoughts on this. For example, one could sub-class Tensor to a TemporalTensor that actually allocates more memory in the node dimension than what is currently needed. To the outside, it utilizes a slicing/masking appraoch to only expose the parts of the tensor that are actually filled with information. In the inside, however, we do not have to allocate new memory (at least not all the time) whenever a new node appears.

rusty1s avatar Feb 10 '22 09:02 rusty1s

Most of the CTDG GNNs are using events stream with this same structure: source node id, destination node id, time and message. Thus, it seems the features of the nodes and edges features are mixed in the message. So, if my understanding is right, there is no way to extract a static graph (snapshot) with features in nodes and edges (all the features will be always in edges since they represent the messages/events). We may conclude that the support for DTDG may be based on a different structure (something more similar to what we already have in PyG Temporal, for instance).

@otaviocx I don't think node events should be treated as self-loops, or at least I'm not sure what the rationale for that is. I'm not sure how a model can distinguish between actual self-loops and node events. You can generalize this format to have both node and edge events. This can be implemented by:

  • Having an extra event_type column/key, indicating whether the event is a node or edge event, or more generally the type of node/edge event.
  • Similar to how we pad sequence lengths of sequential models like RNNs, when feature sizes are of different lengths, shorter events can be padded with zeros.
  • Node events can have anything in the destination node (i.e. dst) column/key, such as the source node as you've suggested, although I think changing it to a fixed invalid node id (e.g. -1) and building in checks to ensure the dst node id is not valid may make things less error-prone.

There may be different networks to learn from updates to nodes and edges respectively, and we can use slicing/masking to get the relevant events.

This offers a few benefits:

  1. We can now distinguish between self-loops and node events
  2. We can have events with different numbers of features
  3. We can now handle arbitrarily many types of events in different ways, including events involving changes to the number of nodes/edges

What do you think?

yulonglin avatar Feb 28 '22 00:02 yulonglin

@otaviocx Oh I realised that you have a PR https://github.com/pyg-team/pytorch_geometric/pull/3985 that's almost ready for merging. I'll take a closer look later this weekend to understand if what I said above makes sense given that it's to be merged :)

Also left comment abouts TemporalDataLoader shuffling and testing in your PR!

yulonglin avatar Mar 05 '22 02:03 yulonglin

@Padarn, Data.to_temporal(time_attr='time') on the other side is something that I think we can start working on, wdyt?

@Padarn @otaviocx Let me know if I can help with this :)

yulonglin avatar Mar 05 '22 02:03 yulonglin

Hey @yulonglin I haven't started on this task. Please go ahead from my point of view. Sorry I missed the emails from this thread.

Padarn avatar Mar 07 '22 06:03 Padarn

@otaviocx sorry for the slow reply

@Padarn, considering the topics above, I would say we may need some design definitions before starting to code changes regarding snapshots, wdyt?

Yeah agreed. I will take a stab at this in a few days if no one else has (not a lot of time around work for the next few days). Maybe I'll split it into a separate issue.

Padarn avatar Mar 07 '22 06:03 Padarn