llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

Run index operations in the background

Open teoh opened this issue 2 years ago • 1 comments

Today, if I build a gpt index like this:

>>> index = GPTTreeIndex(documents, prompt_helper=prompt_helper)
> Building index from nodes: 502 chunks
0/5029
10/5029
...

This may take a while, and I'm blocked from doing anything else before then. (The same can be said for querying).

If I'm building some app on top of GPT index, and have an endpoint to start the build, like below:

from flask import Flask, jsonify, request

app = Flask(__name__)

@app.route('/build', methods=['POST'])
def build():
    // get documents and prompt_helper
    index = GPTTreeIndex(documents, prompt_helper=prompt_helper)
    // do something with index
    return {
        "message": "build complete"
    }

then I have to wait for the build to complete before getting a response.

I'm looking for ideas on how to support running the build in the background, and checking status, something like below:

// same flask boilerplate
@app.route('/build', methods=['POST'])
def build():
    // stuff
    index = GPTTreeIndex(documents, prompt_helper=prompt_helper)
    // more stuff + produces id to check on later
    return {
        "message": "started building the index",
        "task_id": id
    }

// stuff
def status():
    ...
    // get id
    // returns a message that says that the build is complete, or is x% done

This should be possible with python's threading library, or with a task queue like celery. However it probably gets complicated depending on your application, e.g. you have more than one process.

I'm currently thinking of ways to support this within gpt_index, whether it's adding extra functionality (without bloating the library), or adding some code samples somewhere so that no one's starting from scratch. If you have ideas, please feel free to add them here!

teoh avatar Jan 14 '23 21:01 teoh

I think you'll need to spin up separate worker servers and queue tasks, as you said. I don't think this is specific to the gpt_index project, and I wouldn't expect built-in support.

However, if you go ahead and rewrite gpt_index in something lower level like C or Rust, then that's something that could potentially be beneficial in making runtime indexing more plausible. I looked into Cython to maybe do this myself, but that might only make a marginal difference since some native Python methods are already implemented with CPython and not really the underlying bottleneck.

The bottleneck was the embedding step using Ada embeddings in my use case. This is due to roundtrips from OpenAI servers to generate the embeddings, so I guess threading could be a solution, however, you'll just run into other issues like throttling or 429s which is why I think built-in gpt_index improvements shouldn't be expected.

aliyeysides avatar Jan 23 '23 21:01 aliyeysides

@teoh we are actively improving the runtime of index building and querying, so hopefully this will be less of an issue in the near future!

As @aliyeysides noted, the feature you are describing might be best handled in the application layer (i.e. outside of GPT Index). We are moving more and more underlying logic to be executed async, but the top level API would remain blocking for ease of use.

Disiok avatar Mar 01 '23 16:03 Disiok