v6d icon indicating copy to clipboard operation
v6d copied to clipboard

The graph's incremental update seems not support vertex property's update

Open songqing opened this issue 1 year ago • 11 comments

Describe your problem

#1563 has supported graph data's incremental update, however, it seems vertex property can not update, for example, the full data is: vid value 1 2.0 2 3.0 and the inc data is: vid value 1 4.0 After inc update, the vid 1's value is 2.0 not 4.0

So, can we support vertex property's update?

songqing avatar Oct 24 '23 03:10 songqing

Thanks @songqing.

Fixed in https://github.com/v6d-io/v6d/pull/1600/.

dashanji avatar Oct 24 '23 07:10 dashanji

Thanks @songqing.

Fixed in #1600.

Sorry, there is a mistake, #1600 is another small fix and this issue is still unresolved

songqing avatar Oct 24 '23 07:10 songqing

Oh, my fault. I'm sorry for the noisy.

Reopned.

dashanji avatar Oct 24 '23 07:10 dashanji

So, can we support vertex property's update?

Technically we can, but we define vineyard's objects as immutable objects (to make concurrency control simpler). The incremental update APIs are designed for bulk data loading as well. We currently only support adding to make multi-versioned immutable objects simpler.

For scenarios like continuous incremental graph updating, I would like to suggest GART which is a graph store that supports streaming updates and more suitable for your cases like updating properties (via updating records in tables). GART is built upon vineyard as well.

sighingnow avatar Oct 24 '23 08:10 sighingnow

So, can we support vertex property's update?

Technically we can, but we define vineyard's objects as immutable objects (to make concurrency control simpler). The incremental update APIs are designed for bulk data loading as well. We currently only support adding to make multi-versioned immutable objects simpler.

For scenarios like continuous incremental graph updating, I would like to suggest GART which is a graph store that supports streaming updates and more suitable for your cases like updating properties (via updating records in tables). GART is built upon vineyard as well.

OK, I see, thanks for your reply. There is a scenario, graph data is updated daily, for now, we can only load the full data every day, but if we support incremental update with modifying the existed data, we can load the full data at first, then load incremental data the next days, by this way, the data importing will be more efficient and cost less resources. And, there maybe only need small change based on the current incremental update's implementation, with GART, the query performance will be a little bad in this scenario.

songqing avatar Oct 24 '23 12:10 songqing

It can be implemented by

  • for vertices: maintain a copy of the vtable in involved fragment (graph in vineyard is edge-cut), and update the table.
  • for edges: append new properties to the end of current vtable/etable (just like what we already have for adding data, and maintain a copy for the CSR and update the "edge_id" field for corresponding edge.

As the first step, we could support only vertices or edges part.

sighingnow avatar Oct 25 '23 02:10 sighingnow

I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?

sighingnow avatar Oct 25 '23 02:10 sighingnow

I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?

OK, thanks, it's not an urgent issue, I'll have a try later.

songqing avatar Oct 25 '23 02:10 songqing

I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?

OK, thanks, it's not an urgent issue, I'll have a try later.

Hi, could you please check this code block https://github.com/v6d-io/v6d/blob/main/modules/graph/loader/basic_ev_fragment_loader_impl.h#L344~L406. The code block mentioned is to use the origin data. We check the incremental added vertices, and if there is a duplicate, we use the origin table data deliberately. Previously, expected user behaviors' are not to add duplicates, and if there is a duplicate, we will use the origin data.

So if this property is needed, you can revise the code above to update the table data.

@siyuan0322 could you please evaluate this issue

SighingSnow avatar Oct 25 '23 02:10 SighingSnow

Yeah, seems it's a good fit here.

siyuan0322 avatar Oct 25 '23 02:10 siyuan0322

I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?

OK, thanks, it's not an urgent issue, I'll have a try later.

Hi, could you please check this code block https://github.com/v6d-io/v6d/blob/main/modules/graph/loader/basic_ev_fragment_loader_impl.h#L344~L406. The code block mentioned is to use the origin data. We check the incremental added vertices, and if there is a duplicate, we use the origin table data deliberately. Previously, expected user behaviors' are not to add duplicates, and if there is a duplicate, we will use the origin data.

So if this property is needed, you can revise the code above to update the table data.

@siyuan0322 could you please evaluate this issue

Yes, based on the current implementation, there only need small change to solve this issue. Besides the code you mentioned, https://github.com/v6d-io/v6d/blob/main/modules/graph/vertex_map/arrow_vertex_map_impl.h#L487~L500 may also need change.

songqing avatar Oct 25 '23 02:10 songqing