graph-node icon indicating copy to clipboard operation
graph-node copied to clipboard

[Bug] Grafting can skip copying data sources

Open lutter opened this issue 1 year ago • 6 comments

Bug report

The following sequence of steps will lead to a grafted (or copied) subgraph with missing dynamic data sources:

  1. Start a graft
  2. Interrupt it and rewind the subgraph
  3. Resume the graft

If at step (2), the graft had already copied data sources, and the rewind doesn't remove all of them, upon resuming, we will skip copying data sources because of this code

The result will be a graft that seems to be ok, but is missing dynamic data sources.

lutter avatar Jun 04 '24 17:06 lutter

The fix might be simple: the whole of copying private data sources runs outside of a transaction. We should just wrap all of it in a txn.

lutter avatar Jun 04 '24 17:06 lutter

@lutter those transactions could run for days, couldn't they?

avandras avatar Jun 05 '24 06:06 avandras

Copying private data sources should be very quick; at the very worst, it's a table with 100k rows, but usually much less. The actual data copying, which can take days, is already broken into txns that should take about 3 minutes

lutter avatar Jun 05 '24 18:06 lutter

Ah, understood, thanks!

avandras avatar Jun 05 '24 18:06 avandras

We've seen subgraphs up to 1.9 million data sources. Not sure if that matters at all for how this should be handled.

paymog avatar Jun 05 '24 19:06 paymog

True; I just looked through the code again, and it does one insert statement per row, which will be very slow for these numbers of data sources. We'll also need to improve the code to do the copying with one query, or break it up into multiple txns but that requires more bookkeeping.

But we need to first address correctness, which to me is more important than performance issues, though they can be devastating, too.

lutter avatar Jun 05 '24 21:06 lutter