outlines
outlines copied to clipboard
generate.json() gives ValidationError when run with mistral-7b-instruct-v0.2.Q6_K.gguf
Describe the issue as clearly as possible:
Example code with Pydantic and generate.json() throws a ValidationError Code is run from Jupyter Notebook Output is ok if age: int is removed from the Pydantic class
Steps/code to reproduce the bug:
from outlines import models, generate
from llama_cpp import Llama
llm = Llama("/models/mistral-7b-instruct-v0.2.Q6_K.gguf", n_gpu_layers=10, n_ctx=0, verbose=False)
model = models.LlamaCpp(llm)
from pydantic import BaseModel, Field
class User(BaseModel):
first_name: str
last_name: str
age: int
generator = generate.json(model, User, whitespace_pattern="")
result = generator(
"""Based on user information create a user profile with the fields first_name, last_name, age.
User information is: Jane Doe age=10"""
)
print(result)
Expected result:
User(first_name="Jane", last_name="Doe", age=10)
Error message:
JSONDecodeError Traceback (most recent call last)
File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/pydantic/main.py:1097, in BaseModel.parse_raw(cls, b, content_type, encoding, proto, allow_pickle)
1096 try:
-> 1097 obj = parse.load_str_bytes(
1098 b,
1099 proto=proto,
1100 content_type=content_type,
1101 encoding=encoding,
1102 allow_pickle=allow_pickle,
1103 )
1104 except (ValueError, TypeError) as exc:
File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/pydantic/deprecated/parse.py:49, in load_str_bytes(b, content_type, encoding, proto, allow_pickle, json_loads)
48 b = b.decode(encoding)
---> 49 return json_loads(b) # type: ignore
50 elif proto == Protocol.pickle:
File /usr/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
File /usr/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
333 """Return the Python representation of ``s`` (a ``str`` instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
File /usr/lib/python3.10/json/decoder.py:353, in JSONDecoder.raw_decode(self, s, idx)
352 try:
--> 353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
JSONDecodeError: Unterminated string starting at: line 1 column 34 (char 33)
During handling of the above exception, another exception occurred:
ValidationError Traceback (most recent call last)
Cell In[8], line 9
5 age: int
7 generator = generate.json(model, User, whitespace_pattern="")
----> 9 result = generator(
10 """Based on user information create a user profile with the fields first_name, last_name, age.
11 User information is: Jane Doe age=10"""
12 )
14 print(result)
File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/outlines/generate/api.py:501, in SequenceGeneratorAdapter.__call__(self, prompts, max_tokens, stop_at, seed, **model_specific_params)
489 generation_params = self.prepare_generation_parameters(
490 max_tokens, stop_at, seed
491 )
493 completions = self.model.generate(
494 prompts,
495 generation_params,
(...)
498 **model_specific_params,
499 )
--> 501 return format(completions)
File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/outlines/generate/api.py:487, in SequenceGeneratorAdapter.__call__.<locals>.format(sequences)
485 return [format(sequence) for sequence in sequences]
486 else:
--> 487 return self.format_sequence(sequences)
File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/outlines/generate/json.py:50, in json.<locals>.<lambda>(x)
48 regex_str = build_regex_from_schema(schema, whitespace_pattern)
49 generator = regex(model, regex_str, sampler)
---> 50 generator.format_sequence = lambda x: schema_object.parse_raw(x)
51 elif callable(schema_object):
52 schema = pyjson.dumps(get_schema_from_signature(schema_object))
File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/pydantic/main.py:1124, in BaseModel.parse_raw(cls, b, content_type, encoding, proto, allow_pickle)
1117 # ctx is missing here, but since we've added `input` to the error, we're not pretending it's the same
1118 error: pydantic_core.InitErrorDetails = {
1119 # The type: ignore on the next line is to ignore the requirement of LiteralString
1120 'type': pydantic_core.PydanticCustomError(type_str, str(exc)), # type: ignore
1121 'loc': ('__root__',),
1122 'input': b,
1123 }
-> 1124 raise pydantic_core.ValidationError.from_exception_data(cls.__name__, [error])
1125 return cls.model_validate(obj)
ValidationError: 1 validation error for User
__root__
Unterminated string starting at: line 1 column 34 (char 33) [type=value_error.jsondecode, input_value='{"first_name":"Jane","last_name":"Doe', input_type=str]
Outlines/Python version information:
Version information 0.0.40 Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
accelerate==0.28.0
aiohttp==3.9.3
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.3.0
asttokens==2.4.1
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.2.0
beautifulsoup4==4.12.3
certifi==2024.2.2
charset-normalizer==3.3.2
ci-info==0.3.0
click==8.1.7
cloudpickle==3.0.0
comm==0.2.2
configobj==5.0.8
configparser==6.0.1
contourpy==1.2.1
cycler==0.12.1
dataclasses-json==0.6.4
datasets==2.18.0
debugpy==1.8.1
decorator==5.1.1
Deprecated==1.2.14
dill==0.3.8
dirtyjson==1.0.8
diskcache==5.6.3
distro==1.9.0
etelemetry==0.3.1
evaluate==0.4.1
exceptiongroup==1.2.0
executing==2.0.1
fastapi==0.110.1
filelock==3.13.3
fonttools==4.51.0
frozenlist==1.4.1
fsspec==2024.2.0
greenlet==3.0.3
guidance==0.1.13
h11==0.14.0
httpcore==1.0.5
httplib2==0.22.0
httpx==0.27.0
huggingface-hub==0.20.3
idna==3.6
interegular==0.3.3
ipykernel==6.29.4
ipython==8.23.0
isodate==0.6.1
jedi==0.19.1
Jinja2==3.1.3
joblib==1.3.2
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter_client==8.6.1
jupyter_core==5.7.2
kiwisolver==1.4.5
lark==1.1.9
llama-index==0.10.26
llama-index-agent-openai==0.2.1
llama-index-cli==0.1.11
llama-index-core==0.10.26
llama-index-embeddings-huggingface==0.2.0
llama-index-embeddings-openai==0.1.7
llama-index-extractors-entity==0.1.2
llama-index-indices-managed-llama-cloud==0.1.5
llama-index-legacy==0.9.48
llama-index-llms-huggingface==0.1.4
llama-index-llms-llama-cpp==0.1.3
llama-index-llms-openai==0.1.14
llama-index-multi-modal-llms-openai==0.1.4
llama-index-program-guidance==0.1.2
llama-index-program-openai==0.1.5
llama-index-question-gen-openai==0.1.3
llama-index-readers-file==0.1.13
llama-index-readers-llama-parse==0.1.4
llama-parse==0.4.0
llama_cpp_python==0.2.62
llamaindex-py-client==0.1.15
llvmlite==0.42.0
lmql==0.7.3
looseversion==1.3.0
lxml==5.2.1
MarkupSafe==2.1.5
marshmallow==3.21.1
matplotlib==3.8.4
matplotlib-inline==0.1.6
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.2.1
nibabel==5.2.1
nipype==1.8.6
nltk==3.8.1
numba==0.59.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.4.99
nvidia-nvtx-cu12==12.1.105
openai==1.16.0
ordered-set==4.1.0
outlines==0.0.40
packaging==24.0
pandas==2.2.1
parso==0.8.3
pathlib==1.0.1
pexpect==4.9.0
pillow==10.3.0
platformdirs==4.2.0
prompt-toolkit==3.0.43
protobuf==5.26.1
prov==2.0.0
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.2
pyarrow-hotfix==0.6
pydantic==2.6.4
pydantic_core==2.16.3
pydot==2.0.0
pyformlang==1.0.9
Pygments==2.17.2
PyMuPDF==1.24.0
PyMuPDFb==1.24.0
pyparsing==3.1.2
pypdf==4.1.0
python-dateutil==2.9.0.post0
pytz==2024.1
pyxnat==1.6.2
PyYAML==6.0.1
pyzmq==25.1.2
rdflib==7.0.0
referencing==0.35.0
regex==2023.12.25
requests==2.31.0
responses==0.18.0
rpds-py==0.18.0
safetensors==0.4.2
scikit-learn==1.4.2
scipy==1.13.0
sentence-transformers==2.7.0
seqeval==1.2.2
simplejson==3.19.2
six==1.16.0
sniffio==1.3.1
soupsieve==2.5
span-marker==1.5.0
SQLAlchemy==2.0.29
stack-data==0.6.3
starlette==0.37.2
striprtf==0.0.26
sympy==1.12
tenacity==8.2.3
termcolor==2.4.0
threadpoolctl==3.4.0
tiktoken==0.6.0
tokenizers==0.15.2
torch==2.2.2
tornado==6.4
tqdm==4.66.2
traitlets==5.14.2
traits==6.3.2
transformers==4.39.3
triton==2.2.0
typing-inspect==0.9.0
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.2.1
uvicorn==0.29.0
wcwidth==0.2.13
wrapt==1.16.0
xxhash==3.4.1
yarl==1.9.4
Context for the issue:
I would like to present Outlines as an intuitive and fast alternative to Guidance or LMQL in my diploma school project
Can the issue acutally be with the model I am using instead of the mistralai/Mistral-7B-v0.1?
I have the same problem with "mistral-7b-instruct-v0.2.Q5_K_S.gguf"
JSONDecodeError: Unterminated string starting at: line 5 column 5 (char 24)
My code is from the example too.
from outlines import models
import outlines
from outlines import generate
from llama_cpp import Llama
json_grammar = outlines.grammars.json
def add(a: int, b: int):
return a + b
llm = Llama("./mistral-7b-instruct-v0.2.Q5_K_S.gguf", chat_format="mistral")
model = models.LlamaCpp(llm)
generator = generate.json(model, add)
result = generator("Return two integers named a and b respectively. a is odd and b even.")
print(add(**result))
Can you try generator = generate.json(model, add, whitespace_pattern="") ?
Can you try
generator = generate.json(model, add, whitespace_pattern="")?
Yes it works as expected with this. It's kind of confusing though. I didn't find this in the doc. Thank you.
Can you try
generator = generate.json(model, add, whitespace_pattern="")?
After digging around, I found that the model stopped earlier before the json is completed.
By adding max_tokens=100 of some big enough value for the whole output json, it works.
I think it's important to be able to detect this when the model's output is somehow stopped due to the generation control.
Hej! Yes, increasing max_tokens was the trick for me too! Thank you so much for your very helpful advice :)
On Fri, May 3, 2024 at 5:03 PM Haoxian WANG @.***> wrote:
Can you try generator = generate.json(model, add, whitespace_pattern="") ?
After digging around, I found that the model stopped earlier before the json is completed. By adding max_tokens=100 of some big enough value for the whole output json, it works. I think it's important to be able to detect this when the model's output is somehow stopped due to the generation control.
— Reply to this email directly, view it on GitHub https://github.com/outlines-dev/outlines/issues/837#issuecomment-2093201764, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2ZM2DY23QZATSH47SRWP6LZAORMPAVCNFSM6AAAAABGYYBI4CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJTGIYDCNZWGQ . You are receiving this because you authored the thread.Message ID: @.***>
We may need to override llama.cpp default value max_tokens=16
I have the same problem with higher max_tokens if I do not set generator = generate.json(model, User) before each call to generator again.
I wonder if the LlamaCpp backend is not clearing the output of the call before correctly.
The problem with max_tokens itself seems to be solvable by using str with typing.Annotated adding a max_length (in chars, so take care it matches your max_tokens).
Example:
class Person(pydantic.BaseModel):
name: str
description: typing.Annotated[str, pydantic.StringConstraints(strip_whitespace=True, max_length=300)]
Can you try
generator = generate.json(model, add, whitespace_pattern="")?
Any place in the document has more details on this?
Is this fixed now?