pandas-ai
pandas-ai copied to clipboard
Use local model
š The feature
It would be wonderful to use a locally hosted StarCoder or MPT model. As opposed you require an HF or OpenAI key.
Motivation, pitch
Data is often so sensitive even a privacy flag is not enough. e.g. in healthcare. Using locally hosted / on-prem models would afford fast adoption in automated hypothesis generation data exploration.
Alternatives
No response
Additional context
No response
Iād be willing to contribute to this effort. I can look further into this if the team would like to pursue this feature.
@evolu8 Can you share resources on how to run the Locally Starcoder API? If it's within Docker, would be quick to test....
Would be great to have this working. Are you thinking about some approach in particular?
I was thinking about creating a wrapper around https://huggingface.co/docs/hub/index so that we can use any model available on hugging face hub.
@AColocho i can assign this one to you if you want!
@evolu8 Can you share resources on how to run the Locally Starcoder API? If it's within Docker, would be quick to test....
I'd probably use the huggingface library: https://github.com/bigcode-project/starcoder#code-generation probably defaulting to 8bit.
Would be great to have this working. Are you thinking about some approach in particular?
I was thinking about creating a wrapper around https://huggingface.co/docs/hub/index so that we can use any model available on hugging face hub.
@AColocho i can assign this one to you if you want!
It sounds like @evolu8 has a plan. I'll find another issue to contribute to. Ping me if ya'll like an extra set of eyes!
@AColocho @amjadraza @gventuri I have a plan but no time to contrib just now (maybe later)
The plan would be to use this:
https://github.com/LucienShui/huggingface-vscode-endpoint-server
Thoughts?
@AColocho @amjadraza @gventuri I have a plan but no time to contrib just now (maybe later)
The plan would be to use this:
https://github.com/LucienShui/huggingface-vscode-endpoint-server
Thoughts?
This is great! Let me know if you've already started
I've not. Currently traveling so won't this week. Please do if you can. May be I can join the effort soon. š
Sure, I can get it started!
Jus to confirm, are we hoping to use the LucienShui/huggingface-vscode-endpoint-server repo as a plug n play solution, or are we hoping to build something like it?
@AColocho I don't know that library, but seems to be related to vscode, and we should be agnostic.
I think the standard solution would be to use HuggingFace transformers: https://github.com/huggingface/transformers.
I'll assign the task to you š
Right, the repo I linked uses the hugging face transformers library. However, it also wraps it in a super simple FastAPI app. I think having a REST API to the model serving is great, because a lot of folks may have to split across two nodes to give everything the resources needed. Or share precious GPU servers across teams. This repo offers that very simply, for inspiration. Not being super prescriptive here, just a suggestion. Hope it helps :)
If nothing else it shows how to use the huggingface api to run starcoder
I think we better define the request. Originally, the request was to be able to run starcoder and MPT locally. Creating a wrapper around the HuggingFace Transformer library will achieve this. This work could even lay the groundwork to support other models outside of starcoder and MPT (as long as they are on HuggingFace).
It feels like there is a secondary request to be able to query a model that is running on a local server amongst a team. Although I really like this idea, I don't think it fits with the original ask. In my opinion, I think this could be powerful because it would allow teams to use local mode on their own models. We could create a framework to host the model and support to PandasAI to designate a custom endpoint to query.
If we can agree on the first approach/request, then I should have all necessary information to get started. I can create issues discussing the second approach/request. Let me know what ya'll think! @evolu8 @gventuri
Oh sure. Simple steps sound good. Just suggestions from my end š
@AColocho, love this step-by-step approach, go for it š!
I would recommend supporting only local HF models and deferring support for local-server models to something like https://github.com/go-skynet/LocalAI, which conforms models to the OpenAI API.
@recursionbane I think current plans are to support only local HF models. The second part I discussed just brainstorming, but you're absolutely right on this.
Just an update for the rest, I have been experimenting with starcoder, but it seems like the model may be too big to run local. I watched my memory get maxed out to 15GB and my swap grow to 55GB! Which I think makes sense and needless to say, my computer could not run it. Is there any lightweight version of starcoder anyone is aware of or a comparable model that is small enough to run locally?
I've not used it yet, but this could be a way https://github.com/Vahe1994/SpQR
On Sat, 17 Jun 2023, 04:08 Alejandro, @.***> wrote:
@recursionbane https://github.com/recursionbane I think current plans are to support only local HF models. The second part I discussed just brainstorming, but you're absolutely right on this.
Just an update for the rest, I have been experimenting with starcoder, but it seems like the model may be too big to run local. I watched my memory get maxed out to 15GB and my swap grow to 55GB! Which I think makes sense. Is there any lightweight version of starcoder anyone is aware of or a comparable model that is small enough to run locally?
ā Reply to this email directly, view it on GitHub https://github.com/gventuri/pandas-ai/issues/187#issuecomment-1595593804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6KRMTU2FB3CJDTCHCBGDXLUNSLANCNFSM6AAAAAAYSAHSSE . You are receiving this because you were mentioned.Message ID: @.***>
Checked out the repo, and I found this. This code was developed and tested using a single A100 GPU with 80GB GPU RAM. It may successfully run on GPUs with 32 - 40GB
how things going? expect the exciting feature
Checked out the repo, and I found this. This code was developed and tested using a single A100 GPU with 80GB GPU RAM. It may successfully run on GPUs with 32 - 40GB
Another option. Possibly lighter. https://huggingface.co/WizardLM/WizardCoder-15B-V1.0
That SQPR repo is talking of Falcon and LLaMa more than twice the size of StarCoder and WizardCoder. So compressing could be achievable on a 16GB card, possibly?
using INT8, https://huggingface.co/michaelfeil/ct2fast-starcoder i think is pretty good
And another: https://huggingface.co/mrm8488/falcoder-7b
Will be testing these resources today. Haven't had time this week, but I hope to spend some time in the coming days researching the provided models. I think the biggest challenge right now is finding something that fits in memory on a standard computer. Since we are moving out of starcoder, specifically, perhaps we can make it standard in the sense that as a user you can pass in any model you'd like. I'm a little worried on the parsing of the results, but first things first, get a model working locally.
@gventuri and @evolu8 have you guys tried
https://github.com/premAI-io/prem-app/
@amjadraza seems super cool, haven't tried tho. It's just a matter of figuring out whether we want PandasAI itself also to handle the installation of the model or we prefer to rely on third-party apps and only wrap to use it with a localhost url!
Sorry ya'll was away for a bit. I haven't found a model that can support this feature yet. @amjadraza I like the package you have linked to, but even that seems like it needs 16GB+ of ram. @gventuri - I don't think the original plan of creating a wrapper around hugging face will do. I think the alternate plan of running models as a local API will do, but I don't believe I have the necessary hardware to run/test this. If anyone else would like to take over, then feel free to volunteer.
I suspect this 4bit quantized version will run inference on smaller memory. Just under 16gb should do it.
https://huggingface.co/TheBloke/WizardCoder-15B-1.0-GPTQ
@evolu8 the real question is: how long would it take to inference? Do you think we can expect fast responses?
WizardCoder-15B-1.0.ggmlv3.q4_1.bin is pretty quick on a lot of consumer hardware. With vLLM it may even be faster than OpenAI's. And it's strong enough for this purpose. The question for us is about the privacy of data. It doesn't really matter in many cases, how quick the model is, if it's running on someone else's machine, we can't use it.
There is also the whole open vs not a thing. Centralisation has many other challenges e.g. I can be sure a model I host myself is not going to change unexpectedly. When using these tools as components in a stack, reliability is key. I don't want to have to update other components if this one shifts its behavior. This is particularly the case for data scientists (a major population within the pandas userbase).
Hi, checking in if there is a way to use a local model? I see pandasai supports langchain and langchain supports TextGen as well as Ollama, but Im not able to get it to work. My code is below:
from langchain.llms import TextGen
llm3 = TextGen(model_url="http://127.0.0.1:7860/")
and the error I get is
ERROR: response: <Response [404]>
"Unfortunately, I was not able to answer your question, because of the following error:\n\n'NoneType' object is not callable\n"