outlines
outlines copied to clipboard
OpenAI_Compatible (llamacpp server): tiktoken caught in a loop
Describe the issue as clearly as possible:
I'm using the api_like_OAI.py script from the llamacpp repo, which works fine with the official OAI python library. The code below even correctly calls the server:
Adding a print statement in the tokenizer function returns an unending loop, regardless of model name, including official ones like gpt-4 and gpt-3.5-turbo:
As an additional test, I attempted to use the default OpenAI model by setting:
os.environ["OPENAI_BASE_URL"] = "http://localhost:8081"
which just returns a connection error and no activity from the server.
Steps/code to reproduce the bug:
import outlines
model = outlines.models.OpenAICompatibleAPI(model_name="none", api_key="none", base_url="http://localhost:8081", encoding="gpt-4") #tiktoken.get_encoding(name) does not work
prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?
Review: This restaurant is just awesome!
"""
generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(prompt)
print(answer)
Expected result:
"Positive"
Error message:
See above
Outlines/Python version information:
Outlines version: 0.0.25 Python version: 3.10.6
Context for the issue:
No response
I couldn't reproduce. Could you consider trying a tighter integration via https://github.com/outlines-dev/outlines/blob/main/docs/reference/models/llamacpp.md
I couldn't reproduce. Could you consider trying a tighter integration via https://github.com/outlines-dev/outlines/blob/main/docs/reference/models/llamacpp.md
I've already tried but the llamacpp library is much slower than running an OAI proxy. I'm launching the server via this batch file:
start /B python oai_api.py --llama-api http://localhost:8080
start /B server --mlock -ngl 35 -m mistral-7b-instruct-v0.2.Q5_K_M.gguf -c 4096
pause
Hi were you able to solve the issue?
Hi were you able to solve the issue?
Hi, sorry haven't looked at this project and issue since, so I can't remember exactly. I know OAI style API has been fully implemented into the llama.cpp server so hopefully there's better compatibility.