outlines icon indicating copy to clipboard operation
outlines copied to clipboard

generate.json() gives ValidationError when run with mistral-7b-instruct-v0.2.Q6_K.gguf

Open Dodorotata opened this issue 1 year ago • 9 comments
trafficstars

Describe the issue as clearly as possible:

Example code with Pydantic and generate.json() throws a ValidationError Code is run from Jupyter Notebook Output is ok if age: int is removed from the Pydantic class

Steps/code to reproduce the bug:

from outlines import models, generate
from llama_cpp import Llama

llm = Llama("/models/mistral-7b-instruct-v0.2.Q6_K.gguf", n_gpu_layers=10, n_ctx=0, verbose=False)
model = models.LlamaCpp(llm) 

from pydantic import BaseModel, Field
class User(BaseModel):
    first_name: str
    last_name: str
    age: int

generator = generate.json(model, User, whitespace_pattern="")

result = generator(
    """Based on user information create a user profile with the fields first_name, last_name, age.
    User information is: Jane Doe age=10"""
)

print(result)

Expected result:

User(first_name="Jane", last_name="Doe", age=10)

Error message:

JSONDecodeError                           Traceback (most recent call last)
File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/pydantic/main.py:1097, in BaseModel.parse_raw(cls, b, content_type, encoding, proto, allow_pickle)
   1096 try:
-> 1097     obj = parse.load_str_bytes(
   1098         b,
   1099         proto=proto,
   1100         content_type=content_type,
   1101         encoding=encoding,
   1102         allow_pickle=allow_pickle,
   1103     )
   1104 except (ValueError, TypeError) as exc:

File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/pydantic/deprecated/parse.py:49, in load_str_bytes(b, content_type, encoding, proto, allow_pickle, json_loads)
     48         b = b.decode(encoding)
---> 49     return json_loads(b)  # type: ignore
     50 elif proto == Protocol.pickle:

File /usr/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File /usr/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File /usr/lib/python3.10/json/decoder.py:353, in JSONDecoder.raw_decode(self, s, idx)
    352 try:
--> 353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:

JSONDecodeError: Unterminated string starting at: line 1 column 34 (char 33)

During handling of the above exception, another exception occurred:

ValidationError                           Traceback (most recent call last)
Cell In[8], line 9
      5     age: int
      7 generator = generate.json(model, User, whitespace_pattern="")
----> 9 result = generator(
     10     """Based on user information create a user profile with the fields first_name, last_name, age.
     11     User information is: Jane Doe age=10"""
     12 )
     14 print(result)

File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/outlines/generate/api.py:501, in SequenceGeneratorAdapter.__call__(self, prompts, max_tokens, stop_at, seed, **model_specific_params)
    489 generation_params = self.prepare_generation_parameters(
    490     max_tokens, stop_at, seed
    491 )
    493 completions = self.model.generate(
    494     prompts,
    495     generation_params,
   (...)
    498     **model_specific_params,
    499 )
--> 501 return format(completions)

File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/outlines/generate/api.py:487, in SequenceGeneratorAdapter.__call__.<locals>.format(sequences)
    485     return [format(sequence) for sequence in sequences]
    486 else:
--> 487     return self.format_sequence(sequences)

File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/outlines/generate/json.py:50, in json.<locals>.<lambda>(x)
     48     regex_str = build_regex_from_schema(schema, whitespace_pattern)
     49     generator = regex(model, regex_str, sampler)
---> 50     generator.format_sequence = lambda x: schema_object.parse_raw(x)
     51 elif callable(schema_object):
     52     schema = pyjson.dumps(get_schema_from_signature(schema_object))

File ~/LLM-diploma-project/venv/lib/python3.10/site-packages/pydantic/main.py:1124, in BaseModel.parse_raw(cls, b, content_type, encoding, proto, allow_pickle)
   1117     # ctx is missing here, but since we've added `input` to the error, we're not pretending it's the same
   1118     error: pydantic_core.InitErrorDetails = {
   1119         # The type: ignore on the next line is to ignore the requirement of LiteralString
   1120         'type': pydantic_core.PydanticCustomError(type_str, str(exc)),  # type: ignore
   1121         'loc': ('__root__',),
   1122         'input': b,
   1123     }
-> 1124     raise pydantic_core.ValidationError.from_exception_data(cls.__name__, [error])
   1125 return cls.model_validate(obj)

ValidationError: 1 validation error for User
__root__
  Unterminated string starting at: line 1 column 34 (char 33) [type=value_error.jsondecode, input_value='{"first_name":"Jane","last_name":"Doe', input_type=str]

Outlines/Python version information:

Version information 0.0.40 Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

accelerate==0.28.0
aiohttp==3.9.3
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.3.0
asttokens==2.4.1
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.2.0
beautifulsoup4==4.12.3
certifi==2024.2.2
charset-normalizer==3.3.2
ci-info==0.3.0
click==8.1.7
cloudpickle==3.0.0
comm==0.2.2
configobj==5.0.8
configparser==6.0.1
contourpy==1.2.1
cycler==0.12.1
dataclasses-json==0.6.4
datasets==2.18.0
debugpy==1.8.1
decorator==5.1.1
Deprecated==1.2.14
dill==0.3.8
dirtyjson==1.0.8
diskcache==5.6.3
distro==1.9.0
etelemetry==0.3.1
evaluate==0.4.1
exceptiongroup==1.2.0
executing==2.0.1
fastapi==0.110.1
filelock==3.13.3
fonttools==4.51.0
frozenlist==1.4.1
fsspec==2024.2.0
greenlet==3.0.3
guidance==0.1.13
h11==0.14.0
httpcore==1.0.5
httplib2==0.22.0
httpx==0.27.0
huggingface-hub==0.20.3
idna==3.6
interegular==0.3.3
ipykernel==6.29.4
ipython==8.23.0
isodate==0.6.1
jedi==0.19.1
Jinja2==3.1.3
joblib==1.3.2
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter_client==8.6.1
jupyter_core==5.7.2
kiwisolver==1.4.5
lark==1.1.9
llama-index==0.10.26
llama-index-agent-openai==0.2.1
llama-index-cli==0.1.11
llama-index-core==0.10.26
llama-index-embeddings-huggingface==0.2.0
llama-index-embeddings-openai==0.1.7
llama-index-extractors-entity==0.1.2
llama-index-indices-managed-llama-cloud==0.1.5
llama-index-legacy==0.9.48
llama-index-llms-huggingface==0.1.4
llama-index-llms-llama-cpp==0.1.3
llama-index-llms-openai==0.1.14
llama-index-multi-modal-llms-openai==0.1.4
llama-index-program-guidance==0.1.2
llama-index-program-openai==0.1.5
llama-index-question-gen-openai==0.1.3
llama-index-readers-file==0.1.13
llama-index-readers-llama-parse==0.1.4
llama-parse==0.4.0
llama_cpp_python==0.2.62
llamaindex-py-client==0.1.15
llvmlite==0.42.0
lmql==0.7.3
looseversion==1.3.0
lxml==5.2.1
MarkupSafe==2.1.5
marshmallow==3.21.1
matplotlib==3.8.4
matplotlib-inline==0.1.6
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.2.1
nibabel==5.2.1
nipype==1.8.6
nltk==3.8.1
numba==0.59.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.4.99
nvidia-nvtx-cu12==12.1.105
openai==1.16.0
ordered-set==4.1.0
outlines==0.0.40
packaging==24.0
pandas==2.2.1
parso==0.8.3
pathlib==1.0.1
pexpect==4.9.0
pillow==10.3.0
platformdirs==4.2.0
prompt-toolkit==3.0.43
protobuf==5.26.1
prov==2.0.0
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.2
pyarrow-hotfix==0.6
pydantic==2.6.4
pydantic_core==2.16.3
pydot==2.0.0
pyformlang==1.0.9
Pygments==2.17.2
PyMuPDF==1.24.0
PyMuPDFb==1.24.0
pyparsing==3.1.2
pypdf==4.1.0
python-dateutil==2.9.0.post0
pytz==2024.1
pyxnat==1.6.2
PyYAML==6.0.1
pyzmq==25.1.2
rdflib==7.0.0
referencing==0.35.0
regex==2023.12.25
requests==2.31.0
responses==0.18.0
rpds-py==0.18.0
safetensors==0.4.2
scikit-learn==1.4.2
scipy==1.13.0
sentence-transformers==2.7.0
seqeval==1.2.2
simplejson==3.19.2
six==1.16.0
sniffio==1.3.1
soupsieve==2.5
span-marker==1.5.0
SQLAlchemy==2.0.29
stack-data==0.6.3
starlette==0.37.2
striprtf==0.0.26
sympy==1.12
tenacity==8.2.3
termcolor==2.4.0
threadpoolctl==3.4.0
tiktoken==0.6.0
tokenizers==0.15.2
torch==2.2.2
tornado==6.4
tqdm==4.66.2
traitlets==5.14.2
traits==6.3.2
transformers==4.39.3
triton==2.2.0
typing-inspect==0.9.0
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.2.1
uvicorn==0.29.0
wcwidth==0.2.13
wrapt==1.16.0
xxhash==3.4.1
yarl==1.9.4

Context for the issue:

I would like to present Outlines as an intuitive and fast alternative to Guidance or LMQL in my diploma school project

Dodorotata avatar Apr 25 '24 12:04 Dodorotata

Can the issue acutally be with the model I am using instead of the mistralai/Mistral-7B-v0.1?

Dodorotata avatar Apr 25 '24 12:04 Dodorotata

I have the same problem with "mistral-7b-instruct-v0.2.Q5_K_S.gguf"

JSONDecodeError: Unterminated string starting at: line 5 column 5 (char 24)

My code is from the example too.

from outlines import models
import outlines
from outlines import generate
from llama_cpp import Llama

json_grammar = outlines.grammars.json

def add(a: int, b: int):
    return a + b


llm = Llama("./mistral-7b-instruct-v0.2.Q5_K_S.gguf", chat_format="mistral")
model = models.LlamaCpp(llm)
generator = generate.json(model, add)
result = generator("Return two integers named a and b respectively. a is odd and b even.")

print(add(**result))

wang-haoxian avatar May 03 '24 11:05 wang-haoxian

Can you try generator = generate.json(model, add, whitespace_pattern="") ?

rlouf avatar May 03 '24 12:05 rlouf

Can you try generator = generate.json(model, add, whitespace_pattern="") ?

Yes it works as expected with this. It's kind of confusing though. I didn't find this in the doc. Thank you.

wang-haoxian avatar May 03 '24 13:05 wang-haoxian

Can you try generator = generate.json(model, add, whitespace_pattern="") ?

After digging around, I found that the model stopped earlier before the json is completed. By adding max_tokens=100 of some big enough value for the whole output json, it works.
I think it's important to be able to detect this when the model's output is somehow stopped due to the generation control.

wang-haoxian avatar May 03 '24 15:05 wang-haoxian

Hej! Yes, increasing max_tokens was the trick for me too! Thank you so much for your very helpful advice :)

On Fri, May 3, 2024 at 5:03 PM Haoxian WANG @.***> wrote:

Can you try generator = generate.json(model, add, whitespace_pattern="") ?

After digging around, I found that the model stopped earlier before the json is completed. By adding max_tokens=100 of some big enough value for the whole output json, it works. I think it's important to be able to detect this when the model's output is somehow stopped due to the generation control.

— Reply to this email directly, view it on GitHub https://github.com/outlines-dev/outlines/issues/837#issuecomment-2093201764, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2ZM2DY23QZATSH47SRWP6LZAORMPAVCNFSM6AAAAABGYYBI4CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJTGIYDCNZWGQ . You are receiving this because you authored the thread.Message ID: @.***>

Dodorotata avatar May 06 '24 06:05 Dodorotata

We may need to override llama.cpp default value max_tokens=16

rlouf avatar May 06 '24 08:05 rlouf

I have the same problem with higher max_tokens if I do not set generator = generate.json(model, User) before each call to generator again. I wonder if the LlamaCpp backend is not clearing the output of the call before correctly.

The problem with max_tokens itself seems to be solvable by using str with typing.Annotated adding a max_length (in chars, so take care it matches your max_tokens).

Example:

class Person(pydantic.BaseModel):
   name: str
   description: typing.Annotated[str, pydantic.StringConstraints(strip_whitespace=True, max_length=300)]

allo- avatar May 07 '24 17:05 allo-

Can you try generator = generate.json(model, add, whitespace_pattern="") ?

Any place in the document has more details on this?

liqul avatar Jun 04 '24 11:06 liqul

Is this fixed now?

allo- avatar Mar 15 '25 11:03 allo-