langchain icon indicating copy to clipboard operation
langchain copied to clipboard

llama.cpp => model runs fine but no output

Open sergedc opened this issue 1 year ago • 3 comments

Hi,

Windows 11 environement Python: 3.10.11

I installed

  • llama-cpp-python and it works fine and provides output
  • transformers
  • pytorch

Code run:

from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])

llm = LlamaCpp(model_path=r"D:\Win10User\Downloads\AI\Model\vicuna-13B-1.1-GPTQ-4bit-128g.GGML.bin")

llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What is the capital of Belgium?"
llm_chain.run(question)

Output:

llama.cpp: loading model from D:\Win10User\Downloads\AI\Model\vicuna-13B-1.1-GPTQ-4bit-128g.GGML.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 4 (mostly Q4_1, some F16)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 11749.65 MB (+ 3216.00 MB per state)
llama_init_from_file: kv self size  =  800.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

llama_print_timings:        load time =  2154.68 ms
llama_print_timings:      sample time =    75.88 ms /   256 runs   (    0.30 ms per run)
llama_print_timings: prompt eval time =  5060.58 ms /    23 tokens (  220.03 ms per token)
llama_print_timings:        eval time = 72461.40 ms /   255 runs   (  284.16 ms per run)
llama_print_timings:       total time = 77664.50 ms

But there is no answer to the question.... Am I supposed to Print() something?

sergedc avatar Apr 20 '23 20:04 sergedc

Wrap the last line in a print statement, or save it to a variable and print that.

createchange avatar Apr 20 '23 21:04 createchange

Sorry, I feel it was a stupid questions. However, after doing print(llm_chain.run(question)) I got this output:

It has been made the A, a 82/or but of negop of the A:
 If this means of the Mark Thin
 Post script, Bl (a non-e of allters [H, SCL8 will, plow to

While using lama-cpp-python, I got something that made sense, ie talking about countries and capitals. Any idea of what might have gone wrong?

sergedc avatar Apr 21 '23 00:04 sergedc

Same issue with llama_cpp_python==0.1.35 langchain==0.0.145 (used different older versions, and it leads to same issue)

Used code example:

from langchain.llms import LlamaCpp

llm = LlamaCpp(model_path=model_path)

# Basic Q&A
answer = llm(
    "Question: What is the capital of France? Answer: ", stop=["Question:", "\n"]
)
print(f"Answer: {answer.strip()}")

Returns some random text:

Answer: 2 andover these 7lower for K ion battery on 4- your 62rowcan main borge crouder In He rasfe- monitoringlapfe croISfeud linebbernournous match

The example examples/high_level_api/langchain_custom_llm.py provided by abetlen/llama-cpp-python works as expected.

zatevakhin avatar Apr 21 '23 03:04 zatevakhin

I am closing this issue as it has been identified, and is being fixed. Temp solution: llm = LlamaCpp(model_path="../llama/models/ggml-vicuna-13b-4bit-rev1.bin", f16_kv=True) Problem being fixed: llamacpp wrong default value passed for f16_kv (#3320)

sergedc avatar Apr 22 '23 16:04 sergedc