outlines Error: `NotImplementedError` for CFG Logits Processor in VLLM Model

Describe the issue as clearly as possible:

I encountered an issue when attempting to use the generate.cfg function with a VLLM model. The code throws a NotImplementedError, indicating that the CFG Logits processor is not available for the VLLM class.

Steps/code to reproduce the bug:

from vllm import LLM, SamplingParams

llm = LLM(
    "neuralmagic/Llama-3.2-1B-Instruct-quantized.w8a8",
    enable_prefix_caching=True,
    block_size=64,
    max_num_batched_tokens=15000,
    gpu_memory_utilization=0.96,
    max_model_len=15000,
    use_v2_block_manager=True,
)

arithmetic_grammar = """
    ?start: expression

    ?expression: term (("+" | "-") term)*

    ?term: factor (("*" | "/") factor)*

    ?factor: NUMBER
           | "-" factor
           | "(" expression ")"

    %import common.NUMBER
"""

from outlines import generate, models

model = models.VLLM(llm)
generator = generate.cfg(model, arithmetic_grammar)
sampling_params = SamplingParams(temperature=0.3, top_p=0.2, max_tokens=20)

sequence = generator(
    "Alice had 4 apples and Bob ate 2. Write an expression for Alice's apples:",
    sampling_params=sampling_params,
)

Expected result:

I expected the code to generate a sequence based on the defined grammar using the `VLLM` model.

Error message:

Exception has occurred: NotImplementedError
The CFG Logits processor is not available for <class 'outlines.models.vllm.VLLM'>.
  File "/home/lepagnol/Documents/These/format-constrained-for-slu/vllm_test.py", line 30, in <module>
    generator = generate.cfg(model, arithmetic_grammar)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: The CFG Logits processor is not available for <class 'outlines.models.vllm.VLLM'>.

Outlines/Python version information:

Version information

``` aiohappyeyeballs==2.4.3 aiohttp==3.11.6 aiosignal==1.3.1 annotated-types==0.7.0 antlr4-python3-runtime==4.9.3 anyio==4.6.2.post1 asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work attrs==24.2.0 autocommand==2.2.2 backports.tarfile==1.2.0 certifi==2024.8.30 charset-normalizer==3.4.0 click==8.1.7 cloudpickle==3.1.0 cmake==3.31.0.1 comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work compressed-tensors==0.8.0 datasets==3.1.0 debugpy @ file:///home/conda/feedstock_root/build_artifacts/debugpy_1731044888992/work decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work dill==0.3.8 diskcache==5.6.3 distro==1.9.0 einops==0.8.0 exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1720869315914/work executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1725214404607/work fastapi==0.115.5 filelock==3.16.1 frozenlist==1.5.0 fsspec==2024.9.0 gguf==0.10.0 h11==0.14.0 httpcore==1.0.7 httptools==0.6.4 httpx==0.27.2 huggingface-hub==0.26.2 hydra-core==1.3.2 hydra-submitit-launcher==1.2.0 idna==3.10 importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1726082825846/work inflect==7.3.1 interegular==0.3.3 ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1719845459717/work ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1729866374957/work jaraco.collections==5.1.0 jaraco.context==5.3.0 jaraco.functools==4.0.1 jaraco.text==3.12.1 jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1731317204262/work Jinja2==3.1.4 jiter==0.7.1 jiwer==3.0.5 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1726610684920/work jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1727163409502/work lark==1.2.2 llvmlite==0.43.0 lm-format-enforcer==0.10.9 MarkupSafe==3.0.2 matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work mistral_common==1.5.0 more-itertools==10.3.0 mpmath==1.3.0 msgspec==0.18.6 multidict==6.1.0 multiprocess==0.70.16 nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work networkx==3.4.2 ninja==1.11.1.1 numba==0.60.0 numpy==1.26.4 omegaconf==2.3.0 openai==1.54.5 opencv-python-headless==4.10.0.84 outlines==0.0.46 packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1731802491770/work pandas==2.2.3 parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work partial-json-parser==0.2.1.1.post4 pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work pillow==10.4.0 platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1726613481435/work prometheus-fastapi-instrumentator==7.0.0 prometheus_client==0.21.0 prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1727341649933/work propcache==0.2.0 protobuf==5.28.3 psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1729847057810/work ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1721585709575/work py-cpuinfo==9.0.0 pyairports==2.1.1 pyarrow==18.0.0 pycountry==24.6.1 pydantic==2.9.2 pydantic_core==2.23.4 pydot==3.0.2 Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work pyparsing==3.2.0 python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1731919281354/work python-dotenv==1.0.1 pytz==2024.2 PyYAML==6.0.2 pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1728642254015/work RapidFuzz==3.10.1 referencing==0.35.1 regex==2024.11.6 requests==2.32.3 rpds-py==0.21.0 safetensors==0.4.5 sentencepiece==0.2.0 setuptools==75.5.0 setuptools-scm==8.1.0 six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work sniffio==1.3.1 stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work starlette==0.41.3 submitit==1.5.2 sympy==1.13.1 tiktoken==0.7.0 tokenizers==0.20.3 tomli==2.0.1 torch==2.5.1+cpu torchvision==0.20.1+cpu tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1724956131631/work tqdm==4.67.0 traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work transformers==4.46.3 typeguard==4.3.0 typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work tzdata==2024.2 urllib3==2.2.3 uvicorn==0.32.0 uvloop==0.21.0 vllm==0.6.4.post2.dev67+g63f1fde2.cpu watchfiles==0.24.0 wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work websockets==14.1 wheel==0.45.0 xxhash==3.5.0 yarl==1.17.2 zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1731262100163/work ```

Context for the issue:

No response

Nov 21 '24 11:11 PierreLepagnol

I'm not an expert but in the doc (https://dottxt-ai.github.io/outlines/latest/reference/models/vllm/) it's formally said :

This also works with generators built with generate.regex, generate.json, generate.cfg, generate.format and generate.choice.

Nov 21 '24 14:11 PierreLepagnol

Getting the same error

Dec 10 '24 11:12 Tonybodo

Facing the same issue. Any resolution expected on this soon?

Apr 10 '25 18:04 TanmayParekh

This is now available in Outlines v1. Here's the documentation for the model (renamed VLLMOffline as we also have VLLM for the online server mode)

Jun 20 '25 10:06 RobinPicard