graph-node icon indicating copy to clipboard operation
graph-node copied to clipboard

[Bug] Speed up counting entities for copy/graft

Open lutter opened this issue 1 year ago • 0 comments

We need to do something about the entity_count for grafts. Right now, when all data has been copied, graph-node will fire off a big query that counts the entities in the graft; that query can take hours in very large subgraphs.

There's a few different ways to handle that:

  • give up on accurate entity counts and set the count for copies/grafts to some fast estimate (either the count from the source, or the estimate that analyze comes up with)
  • count entities while we copy them. We'd have to turn queries of the form insert into dst select * from src into with ranges (insert into .. returning block_range) select count(*) from ranges where block_range @> int32::MAX and then store the counts for each batch in copy_table_state. After data copying has finished, the entity count is a simple aggregation over copy_table_state
  • keep counting entities as a separate step, but break it into batches along vid just like the actual copying does. That would require quite a bit more book keeping as counting can now be interrupted by node restarts

lutter avatar Jun 06 '24 19:06 lutter