graph-node
graph-node copied to clipboard
[Bug] Grafting can skip copying data sources
Bug report
The following sequence of steps will lead to a grafted (or copied) subgraph with missing dynamic data sources:
- Start a graft
- Interrupt it and rewind the subgraph
- Resume the graft
If at step (2), the graft had already copied data sources, and the rewind doesn't remove all of them, upon resuming, we will skip copying data sources because of this code
The result will be a graft that seems to be ok, but is missing dynamic data sources.
The fix might be simple: the whole of copying private data sources runs outside of a transaction. We should just wrap all of it in a txn.
@lutter those transactions could run for days, couldn't they?
Copying private data sources should be very quick; at the very worst, it's a table with 100k rows, but usually much less. The actual data copying, which can take days, is already broken into txns that should take about 3 minutes
Ah, understood, thanks!
We've seen subgraphs up to 1.9 million data sources. Not sure if that matters at all for how this should be handled.
True; I just looked through the code again, and it does one insert statement per row, which will be very slow for these numbers of data sources. We'll also need to improve the code to do the copying with one query, or break it up into multiple txns but that requires more bookkeeping.
But we need to first address correctness, which to me is more important than performance issues, though they can be devastating, too.