Invalid stop_str in conversion template json file.
Running this code
import os
import replicate
from dotenv import load_dotenv
load_dotenv()
REPLICATE_API_TOKEN = os.getenv("REPLICATE_API_TOKEN")
prompt = "Q: What is 10*10? A: "
output = replicate.run(
"meta/llama-2-7b",
input={
"prompt": prompt,
"max_new_tokens": 1000,
"temperature": 0.75,
"stop_sequences": "1,0"
},
)
output = "".join(output)
print(output)
yields the error
Traceback (most recent call last):
File "/path/to/my/code.py", line 10, in <module>
output = replicate.run(
^^^^^^^^^^^^^^
File "/path/to/my/anaconda3/envs/ltm/lib/python3.11/site-packages/replicate/client.py", line 148, in run
return run(self, ref, input, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/my/anaconda3/envs/ltm/lib/python3.11/site-packages/replicate/run.py", line 61, in run
raise ModelError(prediction.error)
replicate.exceptions.ModelError: Traceback (most recent call last):
3: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#10}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /workspace/mlc-llm/cpp/llm_chat.cc:1545
2: mlc::llm::LLMChat::LoadJSONOverride(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
at /workspace/mlc-llm/cpp/llm_chat.cc:483
1: mlc::llm::LLMChat::LoadJSONOverride(picojson::value const&, bool)
at /workspace/mlc-llm/cpp/llm_chat.cc:458
0: mlc::llm::Conversation::LoadJSONOverride(picojson::value const&, bool)
at /workspace/mlc-llm/cpp/conversation.cc:93
File "/workspace/mlc-llm/cpp/conversation.cc", line 93
TVMError: Check failed: (config["stop_str"].is<std::string>()) is false: Invalid stop_str in conversion template json file.
What is wrong with "stop_sequences": "1,0"?
FYI if you comment out that line, it runs without error. But I need to use stop sequences. Thank you
Hi @jdkanu. To clarify, the error you're seeing is a problem with the model rather than the Python client itself.
Looking at the schema for meta/llama-2-7b, stop_sequences is documented as:
A comma-separated list of sequences to stop generation at. For example, '
, ' will stop generation at the first instance of 'end' or ' '.
Based on that, I think the model is expecting the encoded strings instead of the token ids. Does that work as expected if you pass those instead?
Thanks for your feedback. The intention in this example was to stop the sequence at the first instance of "1" or "0" (ASCII characters) in the completion.
That is, given the prompt "Q: What is 10*10? A: ", we would expect the completion "Q: What is 10*10? A: 100", so we should reliably expect the "1" and "0" to appear in the output, and therefore we should expect the stopping behavior when pass in the correct value for "stop_sequences". It seems like "stop_sequences": "1,0" should stop at "1" or "0" given the documentation, but I keep getting the error.
I always get this error, even if I pass in simpler strings like "stop_sequences": "," or "stop_sequences": "x" (just to see if anything goes through). There is an error as long as the string is not empty (from what I've tried).