llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

FlexGen with GPT Index

Open ekiwi111 opened this issue 1 year ago • 8 comments

FlexGen has just been released - https://github.com/Ying1123/FlexGen

FlexGen is a high-throughput generation engine for running large language models with limited GPU memory (e.g., a 16GB T4 or a 24GB RTX3090 gaming card!).

Would it be possible to run GPT Index with it?

ekiwi111 avatar Feb 21 '23 04:02 ekiwi111

By default, gpt index doesn't run any models, and merely queries the openAI api

You can use custom LLMs that are run locally in gpt index as well. I've got access to a 46GB card, and so far nothing open source has come even close to match OpenAI's LLM performance (Facebook OPT, Google FLAN T5, GPT-J 6B), so I'd say it's not worth your time (but this will likely change in the coming month I suspect! New models are always on the horizon)

logan-markewich avatar Feb 21 '23 04:02 logan-markewich

and so far nothing open source has come even close to match OpenAI's LLM performance (Facebook OPT, Google FLAN T5, GPT-J 6B)

Is this applicable to all downstream tasks or just a subset? Wouldn't they work comparably similar for classification, question-answering or text summarization?

ekiwi111 avatar Feb 21 '23 06:02 ekiwi111

Not really comparable. OPT and GPT-J are great text-generation models, they will ramble on forever. But they won't follow instructions.

FLAN-T5 comes close, as it was trained on over 1000 tasks, but it still struggles to follow some prompts. Additionally, the limited input size (512 tokens!) really limits its usefulness (in the context of this repo)

At least for now, I wouldn't bother using anything other than OpenAI for LLMs, unless you are prepared to take a hit in quality

logan-markewich avatar Feb 21 '23 14:02 logan-markewich

@genesst did you figure out how to use FlexGen with GPTindex? It is still worth using FlexGen for a quick demo. OpenAI's API is quite expensive for large documents.

jiweiqi avatar Feb 25 '23 06:02 jiweiqi

@genesst did you figure out how to use FlexGen with GPTindex? It is still worth using FlexGen for a quick demo. OpenAI's API is quite expensive for large documents.

Not yet, waiting for llama or flan-t5 integration

ekiwi111 avatar Feb 25 '23 23:02 ekiwi111

GPTIndex could be a perfect fit for FlexGen. FlexGen targets high-throughput batch processing, so generating embedding for a batch of local documents is its ideal use case. We (the FlexGen team) will be happy to help if anyone wants to undertake this.

Ying1123 avatar Feb 26 '23 03:02 Ying1123

look forward to a working example for the Q&A task :) Currently, one can choose different openai's model easily following https://gpt-index.readthedocs.io/en/latest/guides/usage_pattern.html#customizing-llm-s

jiweiqi avatar Feb 26 '23 03:02 jiweiqi

Hi @Ying1123, sorry for the delay in response. we'd be happy to explore an integration!

Currently you can generally swap out the underlying LLM through langchain, but we also added a very light chatgpt wrapper here: https://github.com/jerryjliu/gpt_index/blob/main/gpt_index/langchain_helpers/chatgpt.py

jerryjliu avatar Mar 06 '23 19:03 jerryjliu

Hi, @ekiwi111! I'm helping the LlamaIndex team manage their backlog and I wanted to let you know that we are marking this issue as stale.

From what I understand, you were asking if it is possible to run GPT Index with FlexGen, a high-throughput generation engine designed for running large language models with limited GPU memory. There was some discussion in the comments where other users mentioned that currently OpenAI's models are the best for language model performance. However, the FlexGen team expressed interest in helping with integration and suggested that GPTIndex could be a good fit for FlexGen's batch processing capabilities.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution and we appreciate your understanding!

dosubot[bot] avatar Aug 22 '23 16:08 dosubot[bot]