langchain
langchain copied to clipboard
llama.cpp => model runs fine but no output
Hi,
Windows 11 environement Python: 3.10.11
I installed
- llama-cpp-python and it works fine and provides output
- transformers
- pytorch
Code run:
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm = LlamaCpp(model_path=r"D:\Win10User\Downloads\AI\Model\vicuna-13B-1.1-GPTQ-4bit-128g.GGML.bin")
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What is the capital of Belgium?"
llm_chain.run(question)
Output:
llama.cpp: loading model from D:\Win10User\Downloads\AI\Model\vicuna-13B-1.1-GPTQ-4bit-128g.GGML.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 4 (mostly Q4_1, some F16)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 11749.65 MB (+ 3216.00 MB per state)
llama_init_from_file: kv self size = 800.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
llama_print_timings: load time = 2154.68 ms
llama_print_timings: sample time = 75.88 ms / 256 runs ( 0.30 ms per run)
llama_print_timings: prompt eval time = 5060.58 ms / 23 tokens ( 220.03 ms per token)
llama_print_timings: eval time = 72461.40 ms / 255 runs ( 284.16 ms per run)
llama_print_timings: total time = 77664.50 ms
But there is no answer to the question.... Am I supposed to Print() something?
Wrap the last line in a print statement, or save it to a variable and print that.
Sorry, I feel it was a stupid questions. However, after doing print(llm_chain.run(question)) I got this output:
It has been made the A, a 82/or but of negop of the A:
If this means of the Mark Thin
Post script, Bl (a non-e of allters [H, SCL8 will, plow to
While using lama-cpp-python, I got something that made sense, ie talking about countries and capitals. Any idea of what might have gone wrong?
Same issue with llama_cpp_python==0.1.35
langchain==0.0.145
(used different older versions, and it leads to same issue)
Used code example:
from langchain.llms import LlamaCpp
llm = LlamaCpp(model_path=model_path)
# Basic Q&A
answer = llm(
"Question: What is the capital of France? Answer: ", stop=["Question:", "\n"]
)
print(f"Answer: {answer.strip()}")
Returns some random text:
Answer: 2 andover these 7lower for K ion battery on 4- your 62rowcan main borge crouder In He rasfe- monitoringlapfe croISfeud linebbernournous match
The example examples/high_level_api/langchain_custom_llm.py provided by abetlen/llama-cpp-python works as expected.
I am closing this issue as it has been identified, and is being fixed. Temp solution: llm = LlamaCpp(model_path="../llama/models/ggml-vicuna-13b-4bit-rev1.bin", f16_kv=True) Problem being fixed: llamacpp wrong default value passed for f16_kv (#3320)