v6d
v6d copied to clipboard
The graph's incremental update seems not support vertex property's update
Describe your problem
#1563 has supported graph data's incremental update, however, it seems vertex property can not update, for example, the full data is: vid value 1 2.0 2 3.0 and the inc data is: vid value 1 4.0 After inc update, the vid 1's value is 2.0 not 4.0
So, can we support vertex property's update?
Thanks @songqing.
Fixed in https://github.com/v6d-io/v6d/pull/1600/.
Thanks @songqing.
Fixed in #1600.
Sorry, there is a mistake, #1600 is another small fix and this issue is still unresolved
Oh, my fault. I'm sorry for the noisy.
Reopned.
So, can we support vertex property's update?
Technically we can, but we define vineyard's objects as immutable objects (to make concurrency control simpler). The incremental update APIs are designed for bulk data loading as well. We currently only support adding to make multi-versioned immutable objects simpler.
For scenarios like continuous incremental graph updating, I would like to suggest GART which is a graph store that supports streaming updates and more suitable for your cases like updating properties (via updating records in tables). GART is built upon vineyard as well.
So, can we support vertex property's update?
Technically we can, but we define vineyard's objects as immutable objects (to make concurrency control simpler). The incremental update APIs are designed for bulk data loading as well. We currently only support adding to make multi-versioned immutable objects simpler.
For scenarios like continuous incremental graph updating, I would like to suggest GART which is a graph store that supports streaming updates and more suitable for your cases like updating properties (via updating records in tables). GART is built upon vineyard as well.
OK, I see, thanks for your reply. There is a scenario, graph data is updated daily, for now, we can only load the full data every day, but if we support incremental update with modifying the existed data, we can load the full data at first, then load incremental data the next days, by this way, the data importing will be more efficient and cost less resources. And, there maybe only need small change based on the current incremental update's implementation, with GART, the query performance will be a little bad in this scenario.
It can be implemented by
- for vertices: maintain a copy of the vtable in involved fragment (graph in vineyard is edge-cut), and update the table.
- for edges: append new properties to the end of current vtable/etable (just like what we already have for adding data, and maintain a copy for the CSR and update the "edge_id" field for corresponding edge.
As the first step, we could support only vertices or edges part.
I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?
I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?
OK, thanks, it's not an urgent issue, I'll have a try later.
I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?
OK, thanks, it's not an urgent issue, I'll have a try later.
Hi, could you please check this code block https://github.com/v6d-io/v6d/blob/main/modules/graph/loader/basic_ev_fragment_loader_impl.h#L344~L406. The code block mentioned is to use the origin data. We check the incremental added vertices, and if there is a duplicate, we use the origin table data deliberately. Previously, expected user behaviors' are not to add duplicates, and if there is a duplicate, we will use the origin data.
So if this property is needed, you can revise the code above to update the table data.
@siyuan0322 could you please evaluate this issue
Yeah, seems it's a good fit here.
I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?
OK, thanks, it's not an urgent issue, I'll have a try later.
Hi, could you please check this code block https://github.com/v6d-io/v6d/blob/main/modules/graph/loader/basic_ev_fragment_loader_impl.h#L344~L406. The code block mentioned is to use the origin data. We check the incremental added vertices, and if there is a duplicate, we use the origin table data deliberately. Previously, expected user behaviors' are not to add duplicates, and if there is a duplicate, we will use the origin data.
So if this property is needed, you can revise the code above to update the table data.
@siyuan0322 could you please evaluate this issue
Yes, based on the current implementation, there only need small change to solve this issue. Besides the code you mentioned, https://github.com/v6d-io/v6d/blob/main/modules/graph/vertex_map/arrow_vertex_map_impl.h#L487~L500 may also need change.