dspy icon indicating copy to clipboard operation
dspy copied to clipboard

llamacpp support

Open codysnider opened this issue 10 months ago • 4 comments

Tested with:


from llama_cpp import Llama
llm = Llama.from_pretrained(
    repo_id="TheBloke/OpenHermes-2.5-Mistral-7B-GGUF",
    filename="openhermes-2.5-mistral-7b.Q4_K_M.gguf",
    n_ctx=4096,
    n_gpu_layers=10,
    verbose=True
)

llamalm = dspy.LlamaCpp(model="llama", llama_model=llm)
dspy.settings.configure(lm=llamalm)


def summarize_document(document):
    summarize = dspy.ChainOfThought('document -> summary')
    response = summarize(document=document)
    print(response.summary)


if __name__ == "__main__":
    summarize_document("""The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page.""")

codysnider avatar Apr 18 '24 18:04 codysnider

I'm curious what sort of results you got?

I tried w/ the above and llama-3-8b instruct:

Document: The 2019-2020 season was marked by the COVID-19 pandemic. The pandemic caused widespread disruption to sports events around the world, including professional soccer matches in England. Many leagues were suspended or postponed due to government restrictions on

But when using llama-cpp directly w/ the same model and "Summarize the above" prompt suffix:

The 21-year-old player, Lee, signed a contract with Barnsley FC (Tykes), a newly-promoted team. He previously played for West Ham United (Hammers) and had loan spells at Blackpool and Colchester United in League One last season.

I suspect something about configuring the settings needs to include llama-3 instruct prompt formatting?

randerzander avatar Apr 23 '24 18:04 randerzander

@randerzander I'm using it successfully in production with llama 3 (not an instruct variant, though) with typed output:

llm = Llama.from_pretrained(
        repo_id="QuantFactory/Meta-Llama-3-8B-GGUF",
        filename="Meta-Llama-3-8B.Q5_K_M.gguf",
        n_ctx=8192,
        n_gpu_layers=-1,
        verbose=False
    )

    lm = dspy.LlamaCpp(
        model="llama",
        llama_model=llm,
        max_tokens=4096,
        temperature=0.1
    )

    dspy.settings.configure(lm=lm)

    optimizer_compiled_path = "/location/of/my/optimized/programs.json"
    program = CharacterInformation()
    if os.path.exists(optimizer_compiled_path):
        program.load(path=optimizer_compiled_path)
    else:
        optimizer = BootstrapFewShot(
            metric=reasonable_character_information,
            max_bootstrapped_demos=10,
            max_labeled_demos=30,
            max_rounds=50,
            max_errors=100,
        )
        program = optimizer.compile(CharacterInformation(), trainset=dataset)
        program.save(optimizer_compiled_path)

    result = program(premise)

    return json.dumps([model.model_dump() for model in result.answer.characters])
    ```
    
    This is running on an in-house server using Tesla P100s for inference.

codysnider avatar Apr 23 '24 18:04 codysnider

If it drops kwargs[“n”], it should call llama.cpp multiple times to generate the requested number of completions. Some of the optimizers expect multiple completions. You could potentially speed this up a bit by saving the LLM state and restoring it rather than reprocessing the prompt.

How does speed and accuracy compare to starting the native llama.cpp server and calling it from DSPy’s OpenAI interface?

wronkiew avatar Apr 30 '24 05:04 wronkiew

Thanks for the PR @codysnider ! Left a few comments.

Could you run ruff check . --fix-only to fix the failing test and push again?

The PR is also failing the build tests since you need to add import llama_cpp in the import try-except block.

Could you also add documentation for the LlamaCpp LM to provide context as done for the LMs in our documentation here?. Would be great to add the example you've tested as well!

arnavsinghvi11 avatar May 06 '24 01:05 arnavsinghvi11

Hi @codysnider @randerzander , just following up on this PR to resolve the impending comments so we can merge it soon!

arnavsinghvi11 avatar May 31 '24 04:05 arnavsinghvi11

Closed by #1347

arnavsinghvi11 avatar Aug 05 '24 17:08 arnavsinghvi11