Describe the issue as clearly as possible:
I encountered an issue when attempting to use the generate.cfg function with a VLLM model. The code throws a NotImplementedError, indicating that the CFG Logits processor is not available for the VLLM class.
Steps/code to reproduce the bug:
from vllm import LLM, SamplingParams
llm = LLM(
"neuralmagic/Llama-3.2-1B-Instruct-quantized.w8a8",
enable_prefix_caching=True,
block_size=64,
max_num_batched_tokens=15000,
gpu_memory_utilization=0.96,
max_model_len=15000,
use_v2_block_manager=True,
)
arithmetic_grammar = """
?start: expression
?expression: term (("+" | "-") term)*
?term: factor (("*" | "/") factor)*
?factor: NUMBER
| "-" factor
| "(" expression ")"
%import common.NUMBER
"""
from outlines import generate, models
model = models.VLLM(llm)
generator = generate.cfg(model, arithmetic_grammar)
sampling_params = SamplingParams(temperature=0.3, top_p=0.2, max_tokens=20)
sequence = generator(
"Alice had 4 apples and Bob ate 2. Write an expression for Alice's apples:",
sampling_params=sampling_params,
)
Expected result:
I expected the code to generate a sequence based on the defined grammar using the `VLLM` model.
Error message:
Exception has occurred: NotImplementedError
The CFG Logits processor is not available for <class 'outlines.models.vllm.VLLM'>.
File "/home/lepagnol/Documents/These/format-constrained-for-slu/vllm_test.py", line 30, in <module>
generator = generate.cfg(model, arithmetic_grammar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: The CFG Logits processor is not available for <class 'outlines.models.vllm.VLLM'>.
Outlines/Python version information:
Version information
```
aiohappyeyeballs==2.4.3
aiohttp==3.11.6
aiosignal==1.3.1
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==4.6.2.post1
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
attrs==24.2.0
autocommand==2.2.2
backports.tarfile==1.2.0
certifi==2024.8.30
charset-normalizer==3.4.0
click==8.1.7
cloudpickle==3.1.0
cmake==3.31.0.1
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work
compressed-tensors==0.8.0
datasets==3.1.0
debugpy @ file:///home/conda/feedstock_root/build_artifacts/debugpy_1731044888992/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
einops==0.8.0
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1720869315914/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1725214404607/work
fastapi==0.115.5
filelock==3.16.1
frozenlist==1.5.0
fsspec==2024.9.0
gguf==0.10.0
h11==0.14.0
httpcore==1.0.7
httptools==0.6.4
httpx==0.27.2
huggingface-hub==0.26.2
hydra-core==1.3.2
hydra-submitit-launcher==1.2.0
idna==3.10
importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1726082825846/work
inflect==7.3.1
interegular==0.3.3
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1719845459717/work
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1729866374957/work
jaraco.collections==5.1.0
jaraco.context==5.3.0
jaraco.functools==4.0.1
jaraco.text==3.12.1
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1731317204262/work
Jinja2==3.1.4
jiter==0.7.1
jiwer==3.0.5
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1726610684920/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1727163409502/work
lark==1.2.2
llvmlite==0.43.0
lm-format-enforcer==0.10.9
MarkupSafe==3.0.2
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work
mistral_common==1.5.0
more-itertools==10.3.0
mpmath==1.3.0
msgspec==0.18.6
multidict==6.1.0
multiprocess==0.70.16
nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work
networkx==3.4.2
ninja==1.11.1.1
numba==0.60.0
numpy==1.26.4
omegaconf==2.3.0
openai==1.54.5
opencv-python-headless==4.10.0.84
outlines==0.0.46
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1731802491770/work
pandas==2.2.3
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work
partial-json-parser==0.2.1.1.post4
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
pillow==10.4.0
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1726613481435/work
prometheus-fastapi-instrumentator==7.0.0
prometheus_client==0.21.0
prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1727341649933/work
propcache==0.2.0
protobuf==5.28.3
psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1729847057810/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1721585709575/work
py-cpuinfo==9.0.0
pyairports==2.1.1
pyarrow==18.0.0
pycountry==24.6.1
pydantic==2.9.2
pydantic_core==2.23.4
pydot==3.0.2
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work
pyparsing==3.2.0
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1731919281354/work
python-dotenv==1.0.1
pytz==2024.2
PyYAML==6.0.2
pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1728642254015/work
RapidFuzz==3.10.1
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
rpds-py==0.21.0
safetensors==0.4.5
sentencepiece==0.2.0
setuptools==75.5.0
setuptools-scm==8.1.0
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
sniffio==1.3.1
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
starlette==0.41.3
submitit==1.5.2
sympy==1.13.1
tiktoken==0.7.0
tokenizers==0.20.3
tomli==2.0.1
torch==2.5.1+cpu
torchvision==0.20.1+cpu
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1724956131631/work
tqdm==4.67.0
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work
transformers==4.46.3
typeguard==4.3.0
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work
tzdata==2024.2
urllib3==2.2.3
uvicorn==0.32.0
uvloop==0.21.0
vllm==0.6.4.post2.dev67+g63f1fde2.cpu
watchfiles==0.24.0
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work
websockets==14.1
wheel==0.45.0
xxhash==3.5.0
yarl==1.17.2
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1731262100163/work
```
Context for the issue:
No response
I'm not an expert but in the doc (https://dottxt-ai.github.io/outlines/latest/reference/models/vllm/) it's formally said :
This also works with generators built with generate.regex, generate.json, generate.cfg, generate.format and generate.choice.
Facing the same issue. Any resolution expected on this soon?
This is now available in Outlines v1. Here's the documentation for the model (renamed VLLMOffline as we also have VLLM for the online server mode)