llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

Error Recovery Part (ii): pick up where you left off

Open teoh opened this issue 2 years ago • 0 comments

Now that we've added retries with exponential backoff in https://github.com/jerryjliu/gpt_index/pull/215, it would be cool to add support for "picking up where you left off". From the example in https://github.com/jerryjliu/gpt_index/issues/210:

>>> index = GPTTreeIndex(documents, prompt_helper=prompt_helper)
> Building index from nodes: 502 chunks
0/5029
10/5029
20/5029
30/5029
40/5029
50/5029
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
// stack trace and error

If we run index = GPTTreeIndex(documents, prompt_helper=prompt_helper), we'd have to start from the beginning. With 502 chunks above, that's a lot of computation we'd be redoing, not to mention token budget gone to waste!

It would be cool if this happened instead:

>>> index = GPTTreeIndex(documents, prompt_helper=prompt_helper)
> Building index from nodes: 502 chunks
> continuing from chunk 50:
50/5029
60/5029
...
// hopefully no errors this time

I can think of two ways this might be done:

  1. The some global in gpt_index tracks some state that stores the results of the computation from the failed run. This might not be great since tracking state like this is confusing
  2. If we run into errors during, say, the build step of the index, we return some result anyway, which can then be fed into the next call of GPTTreeIndex(documents, prompt_helper=prompt_helper). This might be possible today with index composability?

If we added support for this, I believe that this will give developers more confidence to index larger sets of documents.

teoh avatar Jan 14 '23 02:01 teoh