pandas-ai icon indicating copy to clipboard operation
pandas-ai copied to clipboard

Use local model

Open evolu8 opened this issue 1 year ago ā€¢ 38 comments

šŸš€ The feature

It would be wonderful to use a locally hosted StarCoder or MPT model. As opposed you require an HF or OpenAI key.

Motivation, pitch

Data is often so sensitive even a privacy flag is not enough. e.g. in healthcare. Using locally hosted / on-prem models would afford fast adoption in automated hypothesis generation data exploration.

Alternatives

No response

Additional context

No response

evolu8 avatar May 28 '23 18:05 evolu8

Iā€™d be willing to contribute to this effort. I can look further into this if the team would like to pursue this feature.

AColocho avatar May 28 '23 22:05 AColocho

@evolu8 Can you share resources on how to run the Locally Starcoder API? If it's within Docker, would be quick to test....

amjadraza avatar May 29 '23 02:05 amjadraza

Would be great to have this working. Are you thinking about some approach in particular?

I was thinking about creating a wrapper around https://huggingface.co/docs/hub/index so that we can use any model available on hugging face hub.

@AColocho i can assign this one to you if you want!

gventuri avatar May 29 '23 16:05 gventuri

@evolu8 Can you share resources on how to run the Locally Starcoder API? If it's within Docker, would be quick to test....

I'd probably use the huggingface library: https://github.com/bigcode-project/starcoder#code-generation probably defaulting to 8bit.

evolu8 avatar May 29 '23 20:05 evolu8

Would be great to have this working. Are you thinking about some approach in particular?

I was thinking about creating a wrapper around https://huggingface.co/docs/hub/index so that we can use any model available on hugging face hub.

@AColocho i can assign this one to you if you want!

It sounds like @evolu8 has a plan. I'll find another issue to contribute to. Ping me if ya'll like an extra set of eyes!

AColocho avatar Jun 06 '23 02:06 AColocho

@AColocho @amjadraza @gventuri I have a plan but no time to contrib just now (maybe later)

The plan would be to use this:

https://github.com/LucienShui/huggingface-vscode-endpoint-server

Thoughts?

evolu8 avatar Jun 06 '23 16:06 evolu8

@AColocho @amjadraza @gventuri I have a plan but no time to contrib just now (maybe later)

The plan would be to use this:

https://github.com/LucienShui/huggingface-vscode-endpoint-server

Thoughts?

This is great! Let me know if you've already started

noamsiegel avatar Jun 09 '23 20:06 noamsiegel

I've not. Currently traveling so won't this week. Please do if you can. May be I can join the effort soon. šŸ™‚

evolu8 avatar Jun 11 '23 06:06 evolu8

Sure, I can get it started!

Jus to confirm, are we hoping to use the LucienShui/huggingface-vscode-endpoint-server repo as a plug n play solution, or are we hoping to build something like it?

AColocho avatar Jun 14 '23 01:06 AColocho

@AColocho I don't know that library, but seems to be related to vscode, and we should be agnostic.

I think the standard solution would be to use HuggingFace transformers: https://github.com/huggingface/transformers.

I'll assign the task to you šŸ˜„

gventuri avatar Jun 15 '23 08:06 gventuri

Right, the repo I linked uses the hugging face transformers library. However, it also wraps it in a super simple FastAPI app. I think having a REST API to the model serving is great, because a lot of folks may have to split across two nodes to give everything the resources needed. Or share precious GPU servers across teams. This repo offers that very simply, for inspiration. Not being super prescriptive here, just a suggestion. Hope it helps :)

If nothing else it shows how to use the huggingface api to run starcoder

evolu8 avatar Jun 15 '23 17:06 evolu8

I think we better define the request. Originally, the request was to be able to run starcoder and MPT locally. Creating a wrapper around the HuggingFace Transformer library will achieve this. This work could even lay the groundwork to support other models outside of starcoder and MPT (as long as they are on HuggingFace).

It feels like there is a secondary request to be able to query a model that is running on a local server amongst a team. Although I really like this idea, I don't think it fits with the original ask. In my opinion, I think this could be powerful because it would allow teams to use local mode on their own models. We could create a framework to host the model and support to PandasAI to designate a custom endpoint to query.

If we can agree on the first approach/request, then I should have all necessary information to get started. I can create issues discussing the second approach/request. Let me know what ya'll think! @evolu8 @gventuri

AColocho avatar Jun 16 '23 03:06 AColocho

Oh sure. Simple steps sound good. Just suggestions from my end šŸ™‚

evolu8 avatar Jun 16 '23 06:06 evolu8

@AColocho, love this step-by-step approach, go for it šŸ˜„!

gventuri avatar Jun 16 '23 12:06 gventuri

I would recommend supporting only local HF models and deferring support for local-server models to something like https://github.com/go-skynet/LocalAI, which conforms models to the OpenAI API.

recursionbane avatar Jun 16 '23 15:06 recursionbane

@recursionbane I think current plans are to support only local HF models. The second part I discussed just brainstorming, but you're absolutely right on this.

Just an update for the rest, I have been experimenting with starcoder, but it seems like the model may be too big to run local. I watched my memory get maxed out to 15GB and my swap grow to 55GB! Which I think makes sense and needless to say, my computer could not run it. Is there any lightweight version of starcoder anyone is aware of or a comparable model that is small enough to run locally?

AColocho avatar Jun 17 '23 03:06 AColocho

I've not used it yet, but this could be a way https://github.com/Vahe1994/SpQR

On Sat, 17 Jun 2023, 04:08 Alejandro, @.***> wrote:

@recursionbane https://github.com/recursionbane I think current plans are to support only local HF models. The second part I discussed just brainstorming, but you're absolutely right on this.

Just an update for the rest, I have been experimenting with starcoder, but it seems like the model may be too big to run local. I watched my memory get maxed out to 15GB and my swap grow to 55GB! Which I think makes sense. Is there any lightweight version of starcoder anyone is aware of or a comparable model that is small enough to run locally?

ā€” Reply to this email directly, view it on GitHub https://github.com/gventuri/pandas-ai/issues/187#issuecomment-1595593804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6KRMTU2FB3CJDTCHCBGDXLUNSLANCNFSM6AAAAAAYSAHSSE . You are receiving this because you were mentioned.Message ID: @.***>

evolu8 avatar Jun 17 '23 05:06 evolu8

Checked out the repo, and I found this. This code was developed and tested using a single A100 GPU with 80GB GPU RAM. It may successfully run on GPUs with 32 - 40GB

AColocho avatar Jun 18 '23 04:06 AColocho

how things going? expect the exciting feature

ihorizons2022 avatar Jun 19 '23 02:06 ihorizons2022

Checked out the repo, and I found this. This code was developed and tested using a single A100 GPU with 80GB GPU RAM. It may successfully run on GPUs with 32 - 40GB

Another option. Possibly lighter. https://huggingface.co/WizardLM/WizardCoder-15B-V1.0

That SQPR repo is talking of Falcon and LLaMa more than twice the size of StarCoder and WizardCoder. So compressing could be achievable on a 16GB card, possibly?

evolu8 avatar Jun 19 '23 19:06 evolu8

using INT8, https://huggingface.co/michaelfeil/ct2fast-starcoder i think is pretty good

ihorizons2022 avatar Jun 20 '23 02:06 ihorizons2022

And another: https://huggingface.co/mrm8488/falcoder-7b

evolu8 avatar Jun 21 '23 14:06 evolu8

Will be testing these resources today. Haven't had time this week, but I hope to spend some time in the coming days researching the provided models. I think the biggest challenge right now is finding something that fits in memory on a standard computer. Since we are moving out of starcoder, specifically, perhaps we can make it standard in the sense that as a user you can pass in any model you'd like. I'm a little worried on the parsing of the results, but first things first, get a model working locally.

AColocho avatar Jun 22 '23 23:06 AColocho

@gventuri and @evolu8 have you guys tried

https://github.com/premAI-io/prem-app/

amjadraza avatar Jun 27 '23 02:06 amjadraza

@amjadraza seems super cool, haven't tried tho. It's just a matter of figuring out whether we want PandasAI itself also to handle the installation of the model or we prefer to rely on third-party apps and only wrap to use it with a localhost url!

gventuri avatar Jun 27 '23 23:06 gventuri

Sorry ya'll was away for a bit. I haven't found a model that can support this feature yet. @amjadraza I like the package you have linked to, but even that seems like it needs 16GB+ of ram. @gventuri - I don't think the original plan of creating a wrapper around hugging face will do. I think the alternate plan of running models as a local API will do, but I don't believe I have the necessary hardware to run/test this. If anyone else would like to take over, then feel free to volunteer.

AColocho avatar Jul 01 '23 17:07 AColocho

I suspect this 4bit quantized version will run inference on smaller memory. Just under 16gb should do it.

https://huggingface.co/TheBloke/WizardCoder-15B-1.0-GPTQ

evolu8 avatar Jul 02 '23 08:07 evolu8

@evolu8 the real question is: how long would it take to inference? Do you think we can expect fast responses?

gventuri avatar Jul 20 '23 23:07 gventuri

WizardCoder-15B-1.0.ggmlv3.q4_1.bin is pretty quick on a lot of consumer hardware. With vLLM it may even be faster than OpenAI's. And it's strong enough for this purpose. The question for us is about the privacy of data. It doesn't really matter in many cases, how quick the model is, if it's running on someone else's machine, we can't use it.

There is also the whole open vs not a thing. Centralisation has many other challenges e.g. I can be sure a model I host myself is not going to change unexpectedly. When using these tools as components in a stack, reliability is key. I don't want to have to update other components if this one shifts its behavior. This is particularly the case for data scientists (a major population within the pandas userbase).

evolu8 avatar Jul 21 '23 07:07 evolu8

Hi, checking in if there is a way to use a local model? I see pandasai supports langchain and langchain supports TextGen as well as Ollama, but Im not able to get it to work. My code is below:

from langchain.llms import TextGen llm3 = TextGen(model_url="http://127.0.0.1:7860/")

and the error I get is ERROR: response: <Response [404]> "Unfortunately, I was not able to answer your question, because of the following error:\n\n'NoneType' object is not callable\n"

rosmur avatar Sep 11 '23 08:09 rosmur