SuperAGI icon indicating copy to clipboard operation
SuperAGI copied to clipboard

SuperAGI Gui dont show LLM Models or Text-generation-UI

Open neophrythe opened this issue 1 year ago • 19 comments

hello i cant find how to activate LLM Models in the gui it shows only gpt and i installed it with text generation web ui and local models are installed and working on the api.

neophrythe avatar Jun 17 '23 20:06 neophrythe

I successfully start with this changes:

  1. In the docker compose yaml change with your model
  • EXTRA_LAUNCH_ARGS="--listen --verbose --extensions openai --auto-devices --gpu-memory 12 --wbits 4 --groupsize 128 --model TheBloke_wizard-vicuna-13B-GPTQ --model_type Llama"
  1. In the config.yaml MODEL_NAME: "TheBloke_wizard-vicuna-13B-GPTQ"

You can download the model and put in the SuperAGI\tgwui\config\models\TheBloke_wizard-vicuna-13B-GPTQ subdir, or download directly in the tgwui and restart docker compose.

I can't use very well, sometimes json isn't ok, and second and subsequent doesn't work at all.

Please sirajperson, give us a complete example with your model selection...

amaza avatar Jun 17 '23 21:06 amaza

So the open source LLM option is not working? "I can't use very well, sometimes json isn't ok, and second and subsequent doesn't work at all."

shiloh92 avatar Jun 19 '23 22:06 shiloh92

Possibly I am not using the correct LLM, or needs bigger one (limited GPU vram 12Gb). I've tested several 7B (vicuna, wizardlm, wizard mega, wizard vicuna...). The 13B ones loads ok, but first run goes out of memory error.

Would be great to have a simple minimal working example. Model used, Agent definition, expected result for 2-3 iterations.

amaza avatar Jun 20 '23 07:06 amaza

Same issue!!

dillfrescott avatar Jun 23 '23 06:06 dillfrescott

@sirajperson can you help out here?

neelayan7 avatar Jun 23 '23 07:06 neelayan7

docker-compose -f local-llm up --build docker-compose -f local-llm-gpu up --build theese 2 both worked so far for me to get it running but its still missing in the gui.

neophrythe avatar Jun 23 '23 21:06 neophrythe

Hey everyone. The primary issue we're having here is the position embedding. LLama has a length of 2048, what ChatGPT uses 4096. To be hones, I would really like to see GGML as an endpoint for the chat API. That would give the task agent access to MPT30B-instruct. Which is very desirable. This model has uses the same embedding length as GPT4. From my initial overview, it seems that the easiest way to accomplish this is to include the python-ggml library, which doesn't have a an OpenAI API dropin, like llama-cpp-python. Otherwise one would have to stick to using GPTQ or HF models... which support less variety of target clients. Because many NVidia cards in use are still at a compute level less than 7.0, GPTQ is not accessible as the cards do not support FP32 bit half precision operations. The solution that works best for the broadest number of available cards, and the easiest to maintain as the problem would rack up a ton of bug/feature requests is to work with GGML converted models and to use the the GPU for off loaded layers. This is where we get into a bit of a bind though, as the method to inference GGML converted models in TGUI is llama-cpp-python. This is where the constraint comes in. However, it doesn't seem like that big of a chore to rectify.

I'm totally up for better suggestions, and would love to help resolve the conflict. I really would like to see this task agent have access local LLMs.

I've synced my fork and have been working on a good fix. Just as a heads up, if getting GGML to interface llama-cpp-python is a solid solution, then including it by default in the project would require that other projects merge the solution. In my fork I'll have to keep a frozen copy of TGUI with an improved llamm-cpp-python module until it's reviewed, approved, and merged by [abetlen] into the project. After that, it would require that Oobabooga include a the new version number into the requirements.txt file of TGWUI... which probably wouldn't take long cause he is like doing a million commits a day LoL.

sirajperson avatar Jun 25 '23 05:06 sirajperson

@neophrythe After you run docker-compose you'll need to point your browser to localhost:7860 and click on the models tab at the top of the navigation menu. Using a GPU to inference your model may require additional driver installation steps depending on your OS. As mentioned in the above text, there's a bit of a issue with the kinds of models that are presently working. CPU inferences require the use of llama-cpp-python, which limit the hardware availability to nvidia cards using. I haven't tried openCL but I'm thinking it would be the same issue with GPTQ models, since quantization is how some of the best performing models even begin to come within reach.

sirajperson avatar Jun 25 '23 05:06 sirajperson

@sirajperson this is now inaccurate with the SuperHot8k lora or https://arxiv.org/pdf/2306.15595.pdf - so 8-16k ctx on 13/30b isnt a problem anymore if you can allow to either load a lora/qlora ondemand or any of the pre-merged models

darkacorn avatar Jun 28 '23 09:06 darkacorn

Yep, I have been able to load a laura to give Llama based models the 8k context window in the last 16 hours. How exciting. :-D

sirajperson avatar Jun 28 '23 22:06 sirajperson

cant wait to see a new version with the new lora way included for everyone :)

neophrythe avatar Jun 30 '23 00:06 neophrythe

The model name being used from the SuperAGI agent is dropped during the POST operation with TGWUI just like the API key. As it stands, TGWUI is simply being used as a stand in for the default. LLM's are not managed with the SuperAGI user interface at this time. @neophrythe This sounds like you're looking for a feature request, but I don't think that would happen until working with local LLMs becomes more stable and polished. We are discussing the stability of local LLM use in issue #542

sirajperson avatar Jul 01 '23 22:07 sirajperson

Is it mandatory to use TGWUI DOcker or can we use an existing Oobabooga installation with our current model?

Should I do something else apart from configuring the model and hte line 10 pointing towards the API in the config.yaml?

juangea avatar Jul 07 '23 22:07 juangea

docker-compose -f local-llm up --build docker-compose -f local-llm-gpu up --build theese 2 both worked so far for me to get it running but its still missing in the gui.

When I use this option, the build quits with an error (scripts folder cannot be found it seems)

rlindstedt avatar Jul 09 '23 22:07 rlindstedt

Hello,

I have the same issue,

=> CACHED [super__tgwui app_base 3/7] RUN --mount=type=cache,target=/root/.cache/pip pip3 install -r /app/requirements.txt 0.0s => ERROR [super__tgwui app_base 4/7] COPY ./scripts/build_extensions.sh /scripts/build_extensions.sh 0.0s => CACHED [super__tgwui app_base 5/7] RUN --mount=type=cache,target=/root/.cache/pip chmod +x /scripts/build_extensions.sh && . /scripts/build_extensions.sh 0.0s => CACHED [super__tgwui app_base 6/7] RUN git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda /app/repositories/GPTQ-for-LLaMa 0.0s => CACHED [super__tgwui app_base 7/7] RUN cd /app/repositories/GPTQ-for-LLaMa/ && python3 setup_cuda.py install 0.0s => CACHED [super__tgwui base 3/10] COPY --from=app_base /app /app 0.0s => CACHED [super__tgwui base 4/10] COPY --from=app_base /src /src 0.0s => CACHED [super__tgwui base 5/10] COPY --from=app_base /venv /venv 0.0s => CACHED [super__tgwui base 6/10] RUN python3 -m venv /venv 0.0s => CACHED [super__tgwui base 7/10] WORKDIR /app 0.0s => CACHED [super__tgwui base 8/10] RUN echo "" > /build_date.txt 0.0s => ERROR [super__tgwui base 9/10] COPY ./scripts /scripts

sambickeita avatar Jul 17 '23 17:07 sambickeita

Is it mandatory to use TGWUI DOcker or can we use an existing Oobabooga installation with our current model?

Should I do something else apart from configuring the model and hte line 10 pointing towards the API in the config.yaml?

I have the same problem. How to load the models in the frontend ? I guess it should load using the api like in openai ? I mean /v1/models because in my case it doesn't load anything.

GVidigt avatar Jul 24 '23 14:07 GVidigt

I successfully start with this changes:

  1. In the docker compose yaml change with your model
  • EXTRA_LAUNCH_ARGS="--listen --verbose --extensions openai --auto-devices --gpu-memory 12 --wbits 4 --groupsize 128 --model TheBloke_wizard-vicuna-13B-GPTQ --model_type Llama"
  1. In the config.yaml MODEL_NAME: "TheBloke_wizard-vicuna-13B-GPTQ"

You can download the model and put in the SuperAGI\tgwui\config\models\TheBloke_wizard-vicuna-13B-GPTQ subdir, or download directly in the tgwui and restart docker compose.

I can't use very well, sometimes json isn't ok, and second and subsequent doesn't work at all.

Please sirajperson, give us a complete example with your model selection...

I tried the same.. but in SuperAGI gui, I am not able to see it.. I am able to load the model in TGWUI. Any suggestions @amaza

ArvindSharma18 avatar Jul 25 '23 13:07 ArvindSharma18

I am trying to get local models working without TGWUI but ran into an issue that is probably the same you encounter.

EDIT: I got it working, read below for a rambling log of my thoughts. TLDR: OPENAI_API_BASE is a great setting but the code then still assumes you are using an OpenAI or Google model. You need to do two changes in the code to hack it together and get it working. Read below what I did.

Here is what I did:

  • I launched a local openAI compatible server on port 8000. Using either python -m llama_cpp.server OR using ML Studio. Both produce the same issue.
  • Set OPENAI_API_BASE: http://host.docker.internal:8000/v1
  • I launch docker-compose up --build
  • SuperAGI can see my API because both llama_cpp.server and ML Studio report a request on "GET /v1/models HTTP/1.1" 200 OK and both properly reply with a non-empty list of local models.

The model dropdown issue observed when creating a new agent:

  • When doing the above procedure the Model dropdown (when creating a new agent) is empty
  • When clicking on "Create and Run" I get the message "Your key does not have access to the selected model" which should not really matter as my self-hosted API will accept any key as valid.

What I tried:

  • Various API keys (even a valid OpenAI key)
  • Different ports for the services

What I did then: The issue of not showing the model in the models dropdown was due to the following code line in /superagi/llms/openai.py:

models = [model for model in models if model in models_supported]

I commented that out. Then the models dropdown would effectively be properly populated and I could generate an agent. The agent seemed to be running, however nothing happened. It was stuck on "thinking". The console shows this error:

[2023-08-10 06:55:58,429: ERROR/ForkPoolWorker-8] Task execute_agent[368cfd1c-d930-479c-a03b-5fd93a619cab] raised unexpected: ValueError('/MODEL_PATH_REPLACED/models/TheBloke/Wizard-Vicuna-7B-Uncensored-SuperHOT-8K-GGML/wizard-vicuna-7b-uncensored-superhot-8k.ggmlv3.q4_K_M.bin')
...
  File "/app/superagi/llms/llm_model_factory.py", line 29, in get_model
    return factory.get_model(model, api_key=api_key, **kwargs)
  File "/app/superagi/llms/llm_model_factory.py", line 15, in get_model
    raise ValueError(model)
ValueError: /MODEL_PATH_REPLACED/models/TheBloke/Wizard-Vicuna-7B-Uncensored-SuperHOT-8K-GGML/wizard-vicuna-7b-uncensored-superhot-8k.ggmlv3.q4_K_M.bin

I'm a bit out of my depth on how to fix this. There seems to be a tail of changes required, unfortunately this is above my head. So I drop my observations here, hoping someone smarter than me can figure out how to get local models working with your own local API server.

UPDATE: I got it to run by hacking around a bit. I also modified the /superagi/llms/llm_model_factory.py to cope with unknown models by replacing the get_model function with this (I basically added a rudimentary try and catch that in case of an unknown model treats it as a gpt-3.5-turbo openai model):

def get_model(api_key, model="gpt-3.5-turbo", **kwargs):
    try:
        return factory.get_model(model, api_key=api_key, **kwargs)
    except ValueError:
        return factory.get_model("gpt-3.5-turbo", api_key=api_key, **kwargs)

Now I am able to run local models hosted using my own local server (I mainly use LM Studio for simplicity but it should work in others too). They don't produce good results (especially since I am testing the setup with a 7b model) but the agent runs and comes to "some" conclusion.

So it appears that SuperAGI is not properly handling non openAI/Google models in their codebase yet. With this crude workarounds you can at least get it to somehow work.

SlistInc avatar Aug 10 '23 11:08 SlistInc

SuperAGI isn't working for me. I am not trying to run custom, open-source models. I am trying to run OpenAI models. I run into the same problem as SlistInc when trying to run an agent based on a template. Cannot select any OpenAI model when starting agent from scratch. I incorporated all of the changes listed on this page (except making sure extra args was modified for openai models), tried using deployed docker containers to both Windows 10 PC, and latest Macbook.

osgiliath-stone avatar Aug 11 '23 16:08 osgiliath-stone