llama_index
llama_index copied to clipboard
FlexGen with GPT Index
FlexGen has just been released - https://github.com/Ying1123/FlexGen
FlexGen is a high-throughput generation engine for running large language models with limited GPU memory (e.g., a 16GB T4 or a 24GB RTX3090 gaming card!).
Would it be possible to run GPT Index with it?
By default, gpt index doesn't run any models, and merely queries the openAI api
You can use custom LLMs that are run locally in gpt index as well. I've got access to a 46GB card, and so far nothing open source has come even close to match OpenAI's LLM performance (Facebook OPT, Google FLAN T5, GPT-J 6B), so I'd say it's not worth your time (but this will likely change in the coming month I suspect! New models are always on the horizon)
and so far nothing open source has come even close to match OpenAI's LLM performance (Facebook OPT, Google FLAN T5, GPT-J 6B)
Is this applicable to all downstream tasks or just a subset? Wouldn't they work comparably similar for classification, question-answering or text summarization?
Not really comparable. OPT and GPT-J are great text-generation models, they will ramble on forever. But they won't follow instructions.
FLAN-T5 comes close, as it was trained on over 1000 tasks, but it still struggles to follow some prompts. Additionally, the limited input size (512 tokens!) really limits its usefulness (in the context of this repo)
At least for now, I wouldn't bother using anything other than OpenAI for LLMs, unless you are prepared to take a hit in quality
@genesst did you figure out how to use FlexGen with GPTindex? It is still worth using FlexGen for a quick demo. OpenAI's API is quite expensive for large documents.
@genesst did you figure out how to use FlexGen with GPTindex? It is still worth using FlexGen for a quick demo. OpenAI's API is quite expensive for large documents.
GPTIndex could be a perfect fit for FlexGen. FlexGen targets high-throughput batch processing, so generating embedding for a batch of local documents is its ideal use case. We (the FlexGen team) will be happy to help if anyone wants to undertake this.
look forward to a working example for the Q&A task :) Currently, one can choose different openai's model easily following https://gpt-index.readthedocs.io/en/latest/guides/usage_pattern.html#customizing-llm-s
Hi @Ying1123, sorry for the delay in response. we'd be happy to explore an integration!
Currently you can generally swap out the underlying LLM through langchain, but we also added a very light chatgpt wrapper here: https://github.com/jerryjliu/gpt_index/blob/main/gpt_index/langchain_helpers/chatgpt.py
Hi, @ekiwi111! I'm helping the LlamaIndex team manage their backlog and I wanted to let you know that we are marking this issue as stale.
From what I understand, you were asking if it is possible to run GPT Index with FlexGen, a high-throughput generation engine designed for running large language models with limited GPU memory. There was some discussion in the comments where other users mentioned that currently OpenAI's models are the best for language model performance. However, the FlexGen team expressed interest in helping with integration and suggested that GPTIndex could be a good fit for FlexGen's batch processing capabilities.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your contribution and we appreciate your understanding!