outlines icon indicating copy to clipboard operation
outlines copied to clipboard

State mapping cache ignores the tokenizer used to build the state machine

Open br3no opened this issue 3 months ago • 3 comments

Describe the issue as clearly as possible:

https://github.com/outlines-dev/outlines/blob/4f8433d8d6633b0780c3a6c27981f9adffbe49f5/outlines/fsm/guide.py#L115

The cached function actually depends on the regex and the tokenizer. The tokenizer is not a parameter of the function, though, which leads to cached state maps being shared across different tokenizers, which leads to errors.

Steps/code to reproduce the bug:

import outlines

regex = r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"

model = outlines.models.transformers("stabilityai/stablelm-2-zephyr-1_6b")

prompt = "What is the IP address of the Google DNS servers? "

generator = outlines.generate.regex(
    model,
    regex,
)
structured = generator(prompt, max_tokens=30)

print(structured)

model = outlines.models.transformers("microsoft/phi-2")
generator = outlines.generate.regex(
    model,
    regex,
)
structured = generator(prompt, max_tokens=30)

print(structured)

Expected result:

Both generations should conform to the regex.

Error message:

No response

Outlines/Python version information:

Version information

0.0.41
Python 3.11.4 (main, Nov 24 2023, 14:45:29) [Clang 15.0.0 (clang-1500.0.40.1)]
aiohttp==3.9.3
aiosignal==1.3.1
annotated-types==0.6.0
anyio==3.7.1
appdirs==1.4.4
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.11.1
bleach==6.1.0
boto3==1.24.53
botocore==1.27.96
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
colorama==0.4.6
comm==0.2.1
contourpy==1.2.1
cycler==0.12.1
datasets==2.15.0
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.7
diskcache==5.6.3
distro==1.9.0
dnspython==2.5.0
easyocr==1.7.1
einops==0.7.0
environs==9.5.0
executing==2.0.1
faiss-cpu==1.7.4
fastjsonschema==2.19.1
filelock==3.13.1
fonttools==4.51.0
fqdn==1.5.1
frozendict==2.4.0
frozenlist==1.4.1
fsspec==2023.10.0
grpcio==1.56.0
h11==0.14.0
html5lib==1.1
httpcore==1.0.2
httpx==0.26.0
huggingface-hub==0.19.4
idna==3.6
imageio==2.34.1
interegular==0.3.3
ipykernel==6.29.2
ipython==8.21.0
ipywidgets==8.1.2
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.3
jmespath==1.0.1
joblib==1.3.2
json5==0.9.14
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.9.0
jupyter-lsp==2.2.2
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyter_server==2.12.5
jupyter_server_terminals==0.5.2
jupyterlab==4.0.12
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.2
jupyterlab_widgets==3.0.10
kiwisolver==1.4.5
lark==1.1.9
lazy_loader==0.4
llmware==0.2.3
llvmlite==0.42.0
lxml==4.9.3
MarkupSafe==2.1.5
marshmallow==3.20.2
matplotlib==3.8.4
matplotlib-inline==0.1.6
mistune==3.0.2
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.15
multitasking==0.0.11
nbclient==0.9.0
nbconvert==7.16.0
nbformat==5.9.2
nest-asyncio==1.6.0
networkx==3.2.1
ninja==1.11.1.1
nltk==3.8.1
notebook==7.0.8
notebook_shim==0.2.3
numba==0.59.1
numpy==1.26.4
openai==1.12.0
opencv-python-headless==4.9.0.80
outlines==0.0.41
overrides==7.7.0
packaging==23.2
pandas==2.2.0
pandocfilters==1.5.1
parso==0.8.3
pdf2image==1.16.0
pexpect==4.9.0
pgvector==0.2.4
pillow==10.2.0
platformdirs==4.2.0
prometheus-client==0.19.0
prompt-toolkit==3.0.43
protobuf==4.25.2
psutil==5.9.8
psycopg==3.1.17
psycopg-binary==3.1.17
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.0
pyarrow-hotfix==0.6
pyclipper==1.3.0.post5
pycparser==2.21
pydantic==2.6.1
pydantic_core==2.16.2
Pygments==2.17.2
pymilvus==2.3.0
pymongo==4.5.0
pyparsing==3.1.2
pytesseract==0.3.10
python-bidi==0.4.2
python-dateutil==2.8.2
python-dotenv==1.0.1
python-json-logger==2.0.7
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
qtconsole==5.5.1
QtPy==2.4.1
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.17.1
s3transfer==0.6.2
safetensors==0.4.2
# Editable install with no version control (sandbox==0.1.0)
-e /Users/breno/src/py/sandbox
scikit-image==0.23.2
scikit-learn==1.4.0
scipy==1.12.0
Send2Trash==1.8.2
sentence-transformers==2.2.2
sentencepiece==0.1.99
shapely==2.0.4
six==1.16.0
sniffio==1.3.0
soupsieve==2.5
sseclient-py==1.8.0
stack-data==0.6.3
sympy==1.12
tabulate==0.9.0
terminado==0.18.0
threadpoolctl==3.2.0
tifffile==2024.5.3
timm==0.9.16
tinycss2==1.2.1
tokenizers==0.19.1
torch==2.2.0
torchvision==0.17.0
tornado==6.4
tqdm==4.66.2
traitlets==5.14.1
transformers==4.40.2
types-python-dateutil==2.8.19.20240106
typing_extensions==4.9.0
tzdata==2024.1
ujson==5.9.0
uri-template==1.3.0
urllib3==1.26.18
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
Werkzeug==3.0.1
widgetsnbextension==4.0.10
Wikipedia-API==0.6.0
word2number==1.1
xxhash==3.4.1
yarl==1.9.4
yfinance==0.2.28

Context for the issue:

No response

br3no avatar May 07 '24 11:05 br3no