langchain Output using llamacpp is garbage

Hi there,

Trying to setup a langchain with llamacpp as a first step to use langchain offline:

`from langchain.llms import LlamaCpp

llm = LlamaCpp(model_path="../llama/models/ggml-vicuna-13b-4bit-rev1.bin") text = "Question: What NFL team won the Super Bowl in the year Justin Bieber was born? Answer: Let's think step by step."

print(llm(text))`

The result is:

Plenement that whciation - if a praged and as Work 1 -- but a nice bagingrading per 1, In Homewooded ETenscent is the 0sm toth, ECORO Efph at as an outs! ce, found unprint this a PC, Thom. The RxR-1 dot emD In Not OslKNOT The Home On-a-a-a-aEOEfa-a-aP E. NOT, hotness of-aEF and Life in better-A (resondri Euler, rsa! Home WI Retection and O no-aL25 1 fate to Hosp doubate, p. T, this guiltEisenR-getus WEFI, duro as these disksada Tl.Eis-aRDA* plantly-aRing the Prospecttypen

Running the same question using llama_cpp_python with the same model bin file, the result is (allthough wrong, correctly formatted):

{ "id": "cmpl-d64b69f6-cd50-41e9-8d1c-25b1a5859fac", "object": "text_completion", "created": 1682085552, "model": "./models/ggml-alpaca-7b-native-q4.bin", "choices": [ { "text": "Question: What NFL team won the Super Bowl in the year Justin Bieber was born? Answer: Let's think step by step. Justin was born in 1985, so he was born in the same year as the Super Bowl victory of the Chicago Bears in 1986. So, the answer is the Chicago Bears!", "index": 0, "logprobs": null, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 32, "completion_tokens": 45, "total_tokens": 77 } }

What could be the issue, encoding/decoding?

Apr 21 '23 14:04 JochemLangerak

Here is one more similar issue https://github.com/hwchase17/langchain/issues/3241. There is my comment about the identical problem.

As a workaround can be used solution below:

export MODEL=.../ggml/ggml-vicuna-7b-4bit.bin
python3 -m llama_cpp.server

import os

os.environ["OPENAI_API_KEY"] = "RANDOM-TEXT"
os.environ["OPENAI_API_BASE"] = "http://localhost:8000/v1"

from langchain.llms import OpenAI

llms = OpenAI()

print(llms(
    prompt="The quick brown fox jumps",
    stop=[".", "\n"],
))

This solution relies on the OpenAI API-compatible server implemented in llama-cpp-python. See example: https://github.com/abetlen/llama-cpp-python/blob/main/examples/notebooks/Clients.ipynb

As another good workaround can be used: https://github.com/abetlen/llama-cpp-python/blob/main/examples/high_level_api/langchain_custom_llm.py

Apr 21 '23 19:04 zatevakhin

@JochemLangerak

Here is a fix for your issue, just add the f16_kv=True parameter.

from langchain.llms import LlamaCpp

llm = LlamaCpp(model_path="../llama/models/ggml-vicuna-13b-4bit-rev1.bin", f16_kv=True)
text = "Question: What NFL team won the Super Bowl in the year Justin Bieber was born? Answer: Let's think step by step."

print(llm(text))

Apr 21 '23 21:04 zatevakhin

langchain langchain copied to clipboard

Output using llamacpp is garbage

langchain
langchain copied to clipboard