Raphtory icon indicating copy to clipboard operation
Raphtory copied to clipboard

General follow-ups from PR #2075 (History object and TimeIndexEntry updates)

Open arienandalibi opened this issue 3 months ago • 0 comments

General follow-ups for PR #2075. This PR introduced a history object to access the histories of different elements (like nodes, edges, temporal properties, edge deletions, PathFromNode, PathFromGraph, etc...). Also, TimeIndexEntry(timestamp, secondary_index) was renamed to EventTime(epoch, event_id). This PR also refactored many of the internal and API functions to replace the use of i64 timestamps with EventTime (previously TimeIndexEntry) to allow specifying an event id (previously secondary index) where desired. These are some of the follow-ups that still need to be addressed:

  1. #2295
  2. How should EventTime (previously TimeIndexEntry) be printed in Python?
    • Currently, it's EventTime(epoch={}, event_id={}). Should the event_id be printed? Should the epoch be changed to an easily-readable date time? Should both the epoch and date time be printed?
  3. Update remaining GraphQL types and functions to return EventTime and take TimeInput arguments instead of i64 in both cases.
    • lastOpened(), lastUpdated(), created()
    • functions of types mutableGraph, mutableNode, mutableEdge should be updated
      • addNode(), addEdge(), delete(), etc... currently take Int! as input, should be updated to TimeInput
    • VectorisedGraphWindow::start/end should be EventTime instead of i64
    • TemporalPropertyInput::time should be EventTime instead of i64
  4. Write a benchmark to evaluate the performance benefit tradeoff of using MergedHistory<L, R> vs CompositeHistory
    • CompositeHistory holds a Box<[T]>. T, L, and R can be any type with history operations defined on it.
    • MergedHistory's L and R can be other MergedHistory objects, which can be nested indefinitely.
    • If we only have 2 items, MergedHistory is more efficient. If we have many items, CompositeHistory is more efficient. How many items do we need to have such that CompositeHistory becomes more efficient than MergedHistory? Test using benchmark.
    • Ideally, once we know, we can expose only one of merge/compose to users in Python and manage the switch from MergedHistory to CompositeHistory automatically when it makes sense.
  5. Make sure median() functions all behave the same. Make sure the calculation is the same in all cases.
    • Currently, if there is an even number of items ("2 middle items"), some places pick one of the two to return and the history object's interval.median() returns the average of these 2 items rounded up. Pick one and stick to it.
  6. Add to the docs for PersistentGraph that the event_ids associated to some of its EventTimes may not be the same as in other graph types. Converting some graph to a PersistentGraph might alter some event_ids, so EventTime equality might fail for some history entries. Currently, in our tests, we make sure to only compare epochs (timestamps) for equality between PersistentGraphs and other graph types.
  7. Check if the new DiskGraph still has different event_ids on EventTime entries, or if this was fixed.
    • Update comment in raphtory/tests/df_loaders.rs line 144 if/when it's fixed
  8. Add interval functions onto LazyNodeState for interval type. We currently can't do g.nodes.history.intervals.mean for example.
    • The type is LazyNodeState<ops::Map<HistoryOp<'static, DynamicGraph>, PyIntervals>> and is created in raphtory/src/python/graph/node_state/lazy_node_state_history.rs at line 117.

arienandalibi avatar Oct 08 '25 01:10 arienandalibi