langflow Self hosted LLMs Support

Hello,

Thanks for this awesome work.

Is there any support for custom self hosted LLMs? Like I host multiple models in AWS EC2 instances using https://github.com/huggingface/text-generation-inference. If so, could you please point me to example.

Happy to help or contribute regarding this if not already exists.

May 09 '23 10:05 gsaivinay

Seconded, even just support for HuggingFacePipeline LLM's would be really useful

May 12 '23 17:05 TMRolle

Hey all! I completely agree. I had a hard time testing them and thought the problem was due to streaming which is supported now so maybe all give it another go.

Feel free to try it out too. We might add the pipeline to dev just to set it up for dev testing.

May 12 '23 19:05 ogabrielluiz

I'll open an issue where we'll track the missing modules for each type starting with LLMs.

May 13 '23 11:05 ogabrielluiz

Thanks for the consideration. Big +1 here. Thanks!

May 16 '23 12:05 pounde

Should it be SelfHostedHuggingFaceLLM or HuggingFaceTextGenInference?

May 16 '23 12:05 ogabrielluiz

I've added both of them in this branch but I don't think implementing SelfHostedHuggingFaceLLM will be trivial as it seems it needs some Runhouse objects that are not inside LangChain.

What do y'all think?

Could you take it for a spin to see what breaks?

May 16 '23 12:05 ogabrielluiz

You'll have to forgive my ignorance but I'm just getting up to speed on hosting. I created a simple API to implement Dolly using the HuggingFacePipeline in langchain. In the one minute look, I think that's more akin to SelfHostedHuggingFaceLLM as it's hosted entirely on our own system. Perhaps someone with more experience in this domain has some light to shed on that. I'll try to carve some time to try that branch.

May 16 '23 12:05 pounde

SelfHostedHuggingFaceLLM seems to require Runhouse to be set up and to pass a hardware object of some kind.

We could (and probably should) implement that but then we'd have to define a maintainable way of doing so.

May 16 '23 13:05 ogabrielluiz

I've actually contributed to HuggingFaceTextGenInference, usually we use this server on local machines or Cloud service like AWS EC2, to which we can connect via the API.

If the local machine is able to run a model with SelfHostedHuggingFaceLLM then mostly it can also run the same model withHuggingFaceTextGenInference. Since the latter gives an API to interact with, it'll be easy to use in multiple applications.

May 16 '23 13:05 gsaivinay

@ogabrielluiz -- I built the dockerfile in the branch and I'm not seeing the self-hosted models listed. Has that not caught up to the backend perhaps?

May 16 '23 14:05 pounde

@pounde in the config.yml there's this section:

llms:
  - OpenAI
  # - AzureOpenAI
  - ChatOpenAI
  - HuggingFaceHub
  - LlamaCpp
  - HuggingFaceTextGenInference
  - SelfHostedHuggingFaceLLM
  - HuggingFacePipeline

Theoretically all of these should show up but there could be a bug preventing one of them of showing in the frontend. Since SelfHostedHuggingFaceLLM requires runhouse, maybe we should focus on @gsaivinay HuggingFaceTextGenInference and HuggingFacePipeline.

May 16 '23 15:05 ogabrielluiz

@ogabrielluiz -- sure enough. It's in my config.yaml. So luck on the frontend though. I have:

OpenAI
ChatOpenAI
LlamaCpp
HuggingFaceHub No luck on the others.

May 16 '23 16:05 pounde

I've added the LLM HuggingFaceTextGenInference, locally the behavior was as expected, could you check if it's okay on your side?

Jun 05 '23 16:06 gustavoschaedler

Hey folks, just stumbled upon this. I work on Runhouse - you're correct that the HFTextGen LLM can offer the same functionality for an optimized set of models and a relatively simple setup for access to the server, whereas the SelfHosted models via Runhouse can support any model and a more flexible set of compute (e.g. launching automatically on any cloud), but without automatically handling distribution and model-specific optimizations. The increased flexibility is particularly important in enterprise, but I'm not sure if that's your target userset? I'm happy to help if you're interested in supporting that use case. If you're mainly focused on local compute with a specific set of models HFTextGen should be totally fine.

Jun 07 '23 09:06 dongreenberg

Hey, @dongreenberg. Thanks for reaching out. Runhouse's solution fits very well into our plans. We'd have to build a new way of setting up models to work with Runhouse and help is definitely appreciated.

Please let me know if I can assist you with anything.

Jun 08 '23 03:06 ogabrielluiz

I've added the LLM HuggingFaceTextGenInference, locally the behavior was as expected, could you check if it's okay on your side?

I've tried building from the 263-self-hosted-llms-support branch and I can't seem to get any response in the chat window that pops up. I can connect it to my local text-generation-inference API, and there are responses in the browser developer console but no text appears as a reply. Can you show how this should work please?

EDIT: I get a response correctly if it's just the LLM node. Once I connect it to a ConversationChain it didn't display the LLM reply (but still received it).

Jun 30 '23 07:06 toby-lm

Just to throw another option out there, LangChain supports Ooobabooga's TextGen Web API but it's not in LangFlow yet. In my experience testing different tools, it's one of the most consistently functional and improving locally hosted options for running models and using Nvidia GPUs. Many tools default to the CPU and require advanced setup efforts. TextGen has a one click installer that helps configure it for your system. They also quickly adopt new features like ExLlama to increase token rates. It has many advanced options configurable through the launch flags, or through the ui, allowing users to modify the configurations, and retest to validate if their changes improve performance on their specific machine, rather than trying to shoehorn a non-complete feature list into langchain arguments.

This example goes over how to use LangChain to interact with LLM models via the text-generation-webui API integration. Please ensure that you have text-generation-webui configured and an LLM installed. Recommended installation via the one-click installer appropriate for your OS. Once text-generation-webui is installed and confirmed working via the web interface, please enable the api option either through the web model configuration tab, or by adding the run-time arg --api to your start command.

LangChain Page for TextGen: https://python.langchain.com/docs/modules/model_io/models/llms/integrations/textgen GitHub Page: https://github.com/oobabooga/text-generation-webui/tree/main

Jul 08 '23 00:07 2good4hisowngood

I've added the LLM HuggingFaceTextGenInference, locally the behavior was as expected, could you check if it's okay on your side?

I don't know if you are still working on this but I would really like to try it out! Being able to use langflow with oobabooga would be amazing!!

I found the repo you made here: https://github.com/logspace-ai/langflow/tree/263-self-hosted-llms-support

But I don't know how to install it. I Can install the current version of langflow with pip install langflow, but I'm not sure how to install your version.

I would give you feedback on the branch if you could tell me how to install it. Seriously langflow with oobabooga would be amazing!!!

Jul 09 '23 05:07 RandomInternetPreson

Still no luck on my end. I do have additional options now but no TextGenInterface. Screenshot 2023-07-09 at 10 57 24 AM

Jul 09 '23 07:07 pounde

Any news on this? Would love to know how to use the Textgen API with Langflow.

Aug 02 '23 19:08 thomclae33

Also looking for an update on this please! Been scouring the entire internet for a solution but cant find anything

Aug 27 '23 01:08 Jirito0

Its work in 0.5.0a0 version

Sep 02 '23 16:09 vvlEURO

Hey, would really like an update on this feature for LLM HuggingFaceTextGenInference

Also, is the custom component a good workaround ?

Sep 11 '23 14:09 tonypius

hey! The CustomComponent is a good workaround and we've added the Hugging Face Inference API component last week.

Can it be used in place of the TextGen?

Sep 11 '23 17:09 ogabrielluiz

HuggingFaceTextGenInference is different from Hugging Face Inference API right ?

Sep 11 '23 18:09 tonypius

Its work in 0.5.0a0 version

I wonder if the whole reason this thing work is due to different langchain versions as Im facing a "streaming option currently unsupported" issue in both 0.4.17 and 0.4.18

Sep 14 '23 09:09 m1ll10n

Its work in 0.5.0a0 version

I wonder if the whole reason this thing work is due to different langchain versions as Im facing a "streaming option currently unsupported" issue in both 0.4.17 and 0.4.18

Yes, related. Support for langchain where streaming is added starts with langflow 0.5.0a0.

Oct 06 '23 16:10 vvlEURO

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Nov 20 '23 16:11 stale[bot]