llama_index
llama_index copied to clipboard
How to handle large index files?
Hello there, I am currently having couple of issues/questions I'd like to get advice on:
- How do we handle large index.json files? I have index file which is more than 100MB
- If I want to train the model but with different topic, should I update existing index or create a new index?
- Is there a way for gpt_index to handle multiple index files?
- Large JSON files works fine locally but if you want to deploy app to services like Google Cloud or AWS what stack are you all using? Thank you
@RahulPrasad, I'm answering: "Is there a way for gpt_index to handle multiple index files?"
Following the documentation, here are two options: Option 1 - Uses a list index to query the indices. In this approach, the combined index cannot be saved as a JSON. Option 2 - Persists the combined index by composing a graph that can be queried/loaded/saved.
Regarding "If I want to train the model but with different topic, should I update existing index or create a new index?", do you mean, you wish to query over another set of external data? This toolkit does not help you to train LLMs but let's you build indexes to bridge an LLM to your data. If these are totally unrelated use cases, then create a new index. If the indices are tied to the same use case, then you could create a new index and use one of the options above to combine/compose them.
I'd suggest you break these questions into separate issues, as they are unrelated.
Community, feel free to chime in. I'm only 3 days in. Great repo.
Thank you so much, very helpful!
join the discord if you have more questions! https://discord.gg/dGcwcsnxhU