dgraph icon indicating copy to clipboard operation
dgraph copied to clipboard

Improve Loaders: Add feature to continue a previous load.

Open MichelDiz opened this issue 5 years ago • 2 comments

What you wanted to do

Continue a dataset load from where it stopped, with Live Load or Bulk Load which may have been interrupted by N reasons.

Why that wasn't great, with examples

When an interrupt occurs. And I try to insert the load again, the load start from scratch. This is not desired result. Let's avoid spending time rewriting something that is already in the DB.

MichelDiz avatar Apr 10 '19 20:04 MichelDiz

IMPORTANT

This issue is not just about duplicate Nodes due to a load retry. You can avoid duplicated nodes by using the --xidmap flag.

e.g:

./dgraph live -f test.rdf,other.rdf.gz -s test.schema --xidmap ./xd

Every time you reuse the XIDMAP mapping files, all previously mapped blank_nodes will be automatically addressed/written to the mapped UID.

However the load will always start from scratch, even though Blank_nodes have already been mapped. This issue is just to create a "checkpoint" feature to avoid spending days rewriting something that is already in the DB.

MichelDiz avatar Apr 11 '19 16:04 MichelDiz

Github issues have been deprecated. This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing

minhaj-shakeel avatar Jul 16 '20 13:07 minhaj-shakeel