replicate-python Invalid stop_str in conversion template json file.

Running this code

import os
import replicate
from dotenv import load_dotenv

load_dotenv()
REPLICATE_API_TOKEN = os.getenv("REPLICATE_API_TOKEN")

prompt = "Q: What is 10*10? A: "

output = replicate.run(
    "meta/llama-2-7b",
    input={
        "prompt": prompt,
        "max_new_tokens": 1000,
        "temperature": 0.75,
        "stop_sequences": "1,0"
    },
)
output = "".join(output)
print(output)

yields the error

Traceback (most recent call last):
  File "/path/to/my/code.py", line 10, in <module>
    output = replicate.run(
             ^^^^^^^^^^^^^^
  File "/path/to/my/anaconda3/envs/ltm/lib/python3.11/site-packages/replicate/client.py", line 148, in run
    return run(self, ref, input, **params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/my/anaconda3/envs/ltm/lib/python3.11/site-packages/replicate/run.py", line 61, in run
    raise ModelError(prediction.error)
replicate.exceptions.ModelError: Traceback (most recent call last):
  3: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#10}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /workspace/mlc-llm/cpp/llm_chat.cc:1545
  2: mlc::llm::LLMChat::LoadJSONOverride(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
        at /workspace/mlc-llm/cpp/llm_chat.cc:483
  1: mlc::llm::LLMChat::LoadJSONOverride(picojson::value const&, bool)
        at /workspace/mlc-llm/cpp/llm_chat.cc:458
  0: mlc::llm::Conversation::LoadJSONOverride(picojson::value const&, bool)
        at /workspace/mlc-llm/cpp/conversation.cc:93
  File "/workspace/mlc-llm/cpp/conversation.cc", line 93
TVMError: Check failed: (config["stop_str"].is<std::string>()) is false: Invalid stop_str in conversion template json file.

What is wrong with "stop_sequences": "1,0"?

FYI if you comment out that line, it runs without error. But I need to use stop sequences. Thank you

Mar 08 '24 00:03 jdkanu

Hi @jdkanu. To clarify, the error you're seeing is a problem with the model rather than the Python client itself.

Looking at the schema for meta/llama-2-7b, stop_sequences is documented as:

A comma-separated list of sequences to stop generation at. For example, ',' will stop generation at the first instance of 'end' or ''.

Based on that, I think the model is expecting the encoded strings instead of the token ids. Does that work as expected if you pass those instead?

Mar 11 '24 12:03 mattt

Thanks for your feedback. The intention in this example was to stop the sequence at the first instance of "1" or "0" (ASCII characters) in the completion.

That is, given the prompt "Q: What is 10*10? A: ", we would expect the completion "Q: What is 10*10? A: 100", so we should reliably expect the "1" and "0" to appear in the output, and therefore we should expect the stopping behavior when pass in the correct value for "stop_sequences". It seems like "stop_sequences": "1,0" should stop at "1" or "0" given the documentation, but I keep getting the error.

I always get this error, even if I pass in simpler strings like "stop_sequences": "," or "stop_sequences": "x" (just to see if anything goes through). There is an error as long as the string is not empty (from what I've tried).

Mar 12 '24 23:03 jdkanu