llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

How to handle large index files?

Open RahulPrasad opened this issue 2 years ago • 2 comments

Hello there, I am currently having couple of issues/questions I'd like to get advice on:

  • How do we handle large index.json files? I have index file which is more than 100MB
  • If I want to train the model but with different topic, should I update existing index or create a new index?
  • Is there a way for gpt_index to handle multiple index files?
  • Large JSON files works fine locally but if you want to deploy app to services like Google Cloud or AWS what stack are you all using? Thank you

RahulPrasad avatar Feb 11 '23 15:02 RahulPrasad

@RahulPrasad, I'm answering: "Is there a way for gpt_index to handle multiple index files?"

Following the documentation, here are two options: Option 1 - Uses a list index to query the indices. In this approach, the combined index cannot be saved as a JSON. Option 2 - Persists the combined index by composing a graph that can be queried/loaded/saved.

Regarding "If I want to train the model but with different topic, should I update existing index or create a new index?", do you mean, you wish to query over another set of external data? This toolkit does not help you to train LLMs but let's you build indexes to bridge an LLM to your data. If these are totally unrelated use cases, then create a new index. If the indices are tied to the same use case, then you could create a new index and use one of the options above to combine/compose them.

I'd suggest you break these questions into separate issues, as they are unrelated.

Community, feel free to chime in. I'm only 3 days in. Great repo.

ccfarah avatar Feb 11 '23 16:02 ccfarah

Thank you so much, very helpful!

RahulPrasad avatar Feb 18 '23 13:02 RahulPrasad

join the discord if you have more questions! https://discord.gg/dGcwcsnxhU

jerryjliu avatar Feb 20 '23 19:02 jerryjliu