outlines icon indicating copy to clipboard operation
outlines copied to clipboard

OpenAI_Compatible (llamacpp server): tiktoken caught in a loop

Open its-ven opened this issue 1 year ago • 3 comments

Describe the issue as clearly as possible:

I'm using the api_like_OAI.py script from the llamacpp repo, which works fine with the official OAI python library. The code below even correctly calls the server: image

Adding a print statement in the tokenizer function returns an unending loop, regardless of model name, including official ones like gpt-4 and gpt-3.5-turbo:

image

As an additional test, I attempted to use the default OpenAI model by setting: os.environ["OPENAI_BASE_URL"] = "http://localhost:8081" which just returns a connection error and no activity from the server.

Steps/code to reproduce the bug:

import outlines

model = outlines.models.OpenAICompatibleAPI(model_name="none", api_key="none", base_url="http://localhost:8081", encoding="gpt-4") #tiktoken.get_encoding(name) does not work

prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?

Review: This restaurant is just awesome!
"""

generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(prompt)
print(answer)

Expected result:

"Positive"

Error message:

See above

Outlines/Python version information:

Outlines version: 0.0.25 Python version: 3.10.6

Context for the issue:

No response

its-ven avatar Feb 04 '24 10:02 its-ven

I couldn't reproduce. Could you consider trying a tighter integration via https://github.com/outlines-dev/outlines/blob/main/docs/reference/models/llamacpp.md

lapp0 avatar Feb 05 '24 21:02 lapp0

I couldn't reproduce. Could you consider trying a tighter integration via https://github.com/outlines-dev/outlines/blob/main/docs/reference/models/llamacpp.md

I've already tried but the llamacpp library is much slower than running an OAI proxy. I'm launching the server via this batch file:

start /B python oai_api.py --llama-api http://localhost:8080
start /B server --mlock -ngl 35 -m mistral-7b-instruct-v0.2.Q5_K_M.gguf -c 4096
pause

its-ven avatar Feb 06 '24 08:02 its-ven

Hi were you able to solve the issue?

zanderjiang avatar Aug 07 '24 09:08 zanderjiang

Hi were you able to solve the issue?

Hi, sorry haven't looked at this project and issue since, so I can't remember exactly. I know OAI style API has been fully implemented into the llama.cpp server so hopefully there's better compatibility.

its-ven avatar Aug 22 '24 06:08 its-ven