llama_index
llama_index copied to clipboard
Error Recovery Part (ii): pick up where you left off
Now that we've added retries with exponential backoff in https://github.com/jerryjliu/gpt_index/pull/215, it would be cool to add support for "picking up where you left off". From the example in https://github.com/jerryjliu/gpt_index/issues/210:
>>> index = GPTTreeIndex(documents, prompt_helper=prompt_helper)
> Building index from nodes: 502 chunks
0/5029
10/5029
20/5029
30/5029
40/5029
50/5029
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
// stack trace and error
If we run index = GPTTreeIndex(documents, prompt_helper=prompt_helper)
, we'd have to start from the beginning. With 502 chunks above, that's a lot of computation we'd be redoing, not to mention token budget gone to waste!
It would be cool if this happened instead:
>>> index = GPTTreeIndex(documents, prompt_helper=prompt_helper)
> Building index from nodes: 502 chunks
> continuing from chunk 50:
50/5029
60/5029
...
// hopefully no errors this time
I can think of two ways this might be done:
- The some global in
gpt_index
tracks some state that stores the results of the computation from the failed run. This might not be great since tracking state like this is confusing - If we run into errors during, say, the build step of the index, we return some result anyway, which can then be fed into the next call of
GPTTreeIndex(documents, prompt_helper=prompt_helper)
. This might be possible today with index composability?
If we added support for this, I believe that this will give developers more confidence to index larger sets of documents.