dspy
dspy copied to clipboard
HFModel puts the entire prompt as the output
When using HFModel, the entire prompt is included in the output Prediction. It works as expected using Ollama
Sorry, I could not test it with the same model. My laptop is not able to run mistral-7B directly from HF. I could not find a small model that is present both in Ollama and HF hub.
I know that HFModel is not recommended. However, running docker containers in the UCL cluster is really challenging. Thus, I cannot use Ollama, VLLM, etc. Any alternative if HFModel is welcomed.
Not sure to which extent it would be hard to lift and shift from LangChain or LLamaindex. Unfortunately beyond my expertise.
https://api.python.langchain.com/en/latest/_modules/langchain_community/llms/huggingface_pipeline.html#HuggingFacePipeline https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-huggingface/llama_index/llms/huggingface/base.py
class ChitChat(dspy.Signature):
"""Given a question, provide an answer"""
question = dspy.InputField(prefix="Question:", desc="user question")
answer = dspy.OutputField(prefix="Answer:", desc="valid answer")
chat = dspy.Predict(ChitChat)
HFModel + Tau-0.5B
lm = dspy.HFModel(model = "M4-ai/tau-0.5B",token=HF_TOKEN)
dspy.settings.configure(lm=lm,temperature=temperature)
response = chat(question=text)
print(response.answer)
HF Model output
Given a question, provide an answer
---
Follow the following format.
Question: user question
Answer: valid answer
---
Question: tell me a joke
Answer: I'm sorry, I don't have any jokes. What else can I do for you?
---
Question: what is the best way to kill a cat?
Answer: You can't kill a cat, they're too small.
---
Question: what is the best way to kill a cat?
Answer: You can't kill a cat, they're too small.
---
...
---
Question: what is the best way to kill a cat?
Answer: You can't kill a cat, they
Using Ollama + Mistral 7B
lm = dspy.OllamaLocal(model="mistral")
dspy.settings.configure(lm=lm,temperature=temperature)
response = chat(question=text)
print(response.answer)
Ollama output
Why don't scientists trust atoms?
Because they make up everything!
Note: II had the same issue with HFModel using mistral-7B in the cluster with a more complex example. The example here uses a smaller model just because of what I can run on my laptop for prototyping.
Hi @JPonsa , this is partly because chat models are a bit iffy at the moment in DSPy and some models tend to hallucinate on DSPy's formatting. This can be intermediately fixed with a stopping condition (I've found \n\n
to be helpful when testing mistral variants) or alternative instructions to help the chat model understand what fields to fill.
As for a non-Docker alternative, you could try MLC-Chat although it is fairly experimental as well.
With all this said, there is a backend refactor on the way that will provided added support for chat model formats in DSPy :)
This is actually not about chat models. It’s just that HF sometimes returns the prompt in the output and there’s a flag to truncate them in DSPy’s HFModel class but it’s not always turned on. We’re working on updates for this @dilarasoylu
Fixed it for now with lm.drop_prompt_from_output = True
I am experiencing the same problem when running Llama 3 8B using HFModel. I tried both the instruct and the normal version.