llama_index How to handle large index files?

How to handle large index files?

Open RahulPrasad opened this issue 2 years ago • 2 comments

Hello there, I am currently having couple of issues/questions I'd like to get advice on:

How do we handle large index.json files? I have index file which is more than 100MB
If I want to train the model but with different topic, should I update existing index or create a new index?
Is there a way for gpt_index to handle multiple index files?
Large JSON files works fine locally but if you want to deploy app to services like Google Cloud or AWS what stack are you all using? Thank you

Feb 11 '23 15:02 RahulPrasad

@RahulPrasad, I'm answering: "Is there a way for gpt_index to handle multiple index files?"

Following the documentation, here are two options: Option 1 - Uses a list index to query the indices. In this approach, the combined index cannot be saved as a JSON. Option 2 - Persists the combined index by composing a graph that can be queried/loaded/saved.

Regarding "If I want to train the model but with different topic, should I update existing index or create a new index?", do you mean, you wish to query over another set of external data? This toolkit does not help you to train LLMs but let's you build indexes to bridge an LLM to your data. If these are totally unrelated use cases, then create a new index. If the indices are tied to the same use case, then you could create a new index and use one of the options above to combine/compose them.

I'd suggest you break these questions into separate issues, as they are unrelated.

Community, feel free to chime in. I'm only 3 days in. Great repo.

Feb 11 '23 16:02 ccfarah

Thank you so much, very helpful!

Feb 18 '23 13:02 RahulPrasad

join the discord if you have more questions! https://discord.gg/dGcwcsnxhU

Feb 20 '23 19:02 jerryjliu

llama_index llama_index copied to clipboard

How to handle large index files?

llama_index
llama_index copied to clipboard