langflow
langflow copied to clipboard
Self hosted LLMs Support
Hello,
Thanks for this awesome work.
Is there any support for custom self hosted LLMs? Like I host multiple models in AWS EC2 instances using https://github.com/huggingface/text-generation-inference. If so, could you please point me to example.
Happy to help or contribute regarding this if not already exists.
Seconded, even just support for HuggingFacePipeline LLM's would be really useful
Hey all! I completely agree. I had a hard time testing them and thought the problem was due to streaming which is supported now so maybe all give it another go.
Feel free to try it out too. We might add the pipeline to dev just to set it up for dev testing.
I'll open an issue where we'll track the missing modules for each type starting with LLMs.
Thanks for the consideration. Big +1 here. Thanks!
Should it be SelfHostedHuggingFaceLLM or HuggingFaceTextGenInference?
I've added both of them in this branch but I don't think implementing SelfHostedHuggingFaceLLM will be trivial as it seems it needs some Runhouse objects that are not inside LangChain.
What do y'all think?
Could you take it for a spin to see what breaks?
You'll have to forgive my ignorance but I'm just getting up to speed on hosting. I created a simple API to implement Dolly using the HuggingFacePipeline in langchain. In the one minute look, I think that's more akin to SelfHostedHuggingFaceLLM as it's hosted entirely on our own system. Perhaps someone with more experience in this domain has some light to shed on that. I'll try to carve some time to try that branch.
SelfHostedHuggingFaceLLM seems to require Runhouse to be set up and to pass a hardware object of some kind.
We could (and probably should) implement that but then we'd have to define a maintainable way of doing so.
I've actually contributed to HuggingFaceTextGenInference, usually we use this server on local machines or Cloud service like AWS EC2, to which we can connect via the API.
If the local machine is able to run a model with SelfHostedHuggingFaceLLM
then mostly it can also run the same model withHuggingFaceTextGenInference
. Since the latter gives an API to interact with, it'll be easy to use in multiple applications.
@ogabrielluiz -- I built the dockerfile in the branch and I'm not seeing the self-hosted models listed. Has that not caught up to the backend perhaps?
@pounde in the config.yml there's this section:
llms:
- OpenAI
# - AzureOpenAI
- ChatOpenAI
- HuggingFaceHub
- LlamaCpp
- HuggingFaceTextGenInference
- SelfHostedHuggingFaceLLM
- HuggingFacePipeline
Theoretically all of these should show up but there could be a bug preventing one of them of showing in the frontend. Since SelfHostedHuggingFaceLLM requires runhouse, maybe we should focus on @gsaivinay HuggingFaceTextGenInference and HuggingFacePipeline.
@ogabrielluiz -- sure enough. It's in my config.yaml. So luck on the frontend though. I have:
- OpenAI
- ChatOpenAI
- LlamaCpp
- HuggingFaceHub No luck on the others.
I've added the LLM HuggingFaceTextGenInference, locally the behavior was as expected, could you check if it's okay on your side?
Hey folks, just stumbled upon this. I work on Runhouse - you're correct that the HFTextGen LLM can offer the same functionality for an optimized set of models and a relatively simple setup for access to the server, whereas the SelfHosted models via Runhouse can support any model and a more flexible set of compute (e.g. launching automatically on any cloud), but without automatically handling distribution and model-specific optimizations. The increased flexibility is particularly important in enterprise, but I'm not sure if that's your target userset? I'm happy to help if you're interested in supporting that use case. If you're mainly focused on local compute with a specific set of models HFTextGen should be totally fine.
Hey, @dongreenberg. Thanks for reaching out. Runhouse's solution fits very well into our plans. We'd have to build a new way of setting up models to work with Runhouse and help is definitely appreciated.
Please let me know if I can assist you with anything.
I've added the LLM HuggingFaceTextGenInference, locally the behavior was as expected, could you check if it's okay on your side?
I've tried building from the 263-self-hosted-llms-support
branch and I can't seem to get any response in the chat window that pops up. I can connect it to my local text-generation-inference API, and there are responses in the browser developer console but no text appears as a reply. Can you show how this should work please?
EDIT: I get a response correctly if it's just the LLM node. Once I connect it to a ConversationChain it didn't display the LLM reply (but still received it).
Just to throw another option out there, LangChain supports Ooobabooga's TextGen Web API but it's not in LangFlow yet. In my experience testing different tools, it's one of the most consistently functional and improving locally hosted options for running models and using Nvidia GPUs. Many tools default to the CPU and require advanced setup efforts. TextGen has a one click installer that helps configure it for your system. They also quickly adopt new features like ExLlama to increase token rates. It has many advanced options configurable through the launch flags, or through the ui, allowing users to modify the configurations, and retest to validate if their changes improve performance on their specific machine, rather than trying to shoehorn a non-complete feature list into langchain arguments.
This example goes over how to use LangChain to interact with LLM models via the text-generation-webui API integration. Please ensure that you have text-generation-webui configured and an LLM installed. Recommended installation via the one-click installer appropriate for your OS. Once text-generation-webui is installed and confirmed working via the web interface, please enable the api option either through the web model configuration tab, or by adding the run-time arg --api to your start command.
LangChain Page for TextGen: https://python.langchain.com/docs/modules/model_io/models/llms/integrations/textgen GitHub Page: https://github.com/oobabooga/text-generation-webui/tree/main
I've added the LLM HuggingFaceTextGenInference, locally the behavior was as expected, could you check if it's okay on your side?
I don't know if you are still working on this but I would really like to try it out! Being able to use langflow with oobabooga would be amazing!!
I found the repo you made here: https://github.com/logspace-ai/langflow/tree/263-self-hosted-llms-support
But I don't know how to install it. I Can install the current version of langflow with pip install langflow, but I'm not sure how to install your version.
I would give you feedback on the branch if you could tell me how to install it. Seriously langflow with oobabooga would be amazing!!!
Still no luck on my end. I do have additional options now but no TextGenInterface.
Any news on this? Would love to know how to use the Textgen API with Langflow.
Also looking for an update on this please! Been scouring the entire internet for a solution but cant find anything
Its work in 0.5.0a0 version
Hey, would really like an update on this feature for LLM HuggingFaceTextGenInference
Also, is the custom component a good workaround ?
hey! The CustomComponent is a good workaround and we've added the Hugging Face Inference API component last week.
Can it be used in place of the TextGen?
HuggingFaceTextGenInference is different from Hugging Face Inference API right ?
![]()
Its work in 0.5.0a0 version
I wonder if the whole reason this thing work is due to different langchain versions as Im facing a "streaming option currently unsupported" issue in both 0.4.17 and 0.4.18
![]()
Its work in 0.5.0a0 version
I wonder if the whole reason this thing work is due to different langchain versions as Im facing a "streaming option currently unsupported" issue in both 0.4.17 and 0.4.18
Yes, related. Support for langchain where streaming is added starts with langflow 0.5.0a0.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.