outlines icon indicating copy to clipboard operation
outlines copied to clipboard

Error: `NotImplementedError` for CFG Logits Processor in VLLM Model

Open PierreLepagnol opened this issue 1 year ago • 3 comments

Describe the issue as clearly as possible:

I encountered an issue when attempting to use the generate.cfg function with a VLLM model. The code throws a NotImplementedError, indicating that the CFG Logits processor is not available for the VLLM class.

Steps/code to reproduce the bug:

from vllm import LLM, SamplingParams

llm = LLM(
    "neuralmagic/Llama-3.2-1B-Instruct-quantized.w8a8",
    enable_prefix_caching=True,
    block_size=64,
    max_num_batched_tokens=15000,
    gpu_memory_utilization=0.96,
    max_model_len=15000,
    use_v2_block_manager=True,
)

arithmetic_grammar = """
    ?start: expression

    ?expression: term (("+" | "-") term)*

    ?term: factor (("*" | "/") factor)*

    ?factor: NUMBER
           | "-" factor
           | "(" expression ")"

    %import common.NUMBER
"""

from outlines import generate, models

model = models.VLLM(llm)
generator = generate.cfg(model, arithmetic_grammar)
sampling_params = SamplingParams(temperature=0.3, top_p=0.2, max_tokens=20)

sequence = generator(
    "Alice had 4 apples and Bob ate 2. Write an expression for Alice's apples:",
    sampling_params=sampling_params,
)

Expected result:

I expected the code to generate a sequence based on the defined grammar using the `VLLM` model.

Error message:

Exception has occurred: NotImplementedError
The CFG Logits processor is not available for <class 'outlines.models.vllm.VLLM'>.
  File "/home/lepagnol/Documents/These/format-constrained-for-slu/vllm_test.py", line 30, in <module>
    generator = generate.cfg(model, arithmetic_grammar)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: The CFG Logits processor is not available for <class 'outlines.models.vllm.VLLM'>.

Outlines/Python version information:

Version information

``` aiohappyeyeballs==2.4.3 aiohttp==3.11.6 aiosignal==1.3.1 annotated-types==0.7.0 antlr4-python3-runtime==4.9.3 anyio==4.6.2.post1 asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work attrs==24.2.0 autocommand==2.2.2 backports.tarfile==1.2.0 certifi==2024.8.30 charset-normalizer==3.4.0 click==8.1.7 cloudpickle==3.1.0 cmake==3.31.0.1 comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work compressed-tensors==0.8.0 datasets==3.1.0 debugpy @ file:///home/conda/feedstock_root/build_artifacts/debugpy_1731044888992/work decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work dill==0.3.8 diskcache==5.6.3 distro==1.9.0 einops==0.8.0 exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1720869315914/work executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1725214404607/work fastapi==0.115.5 filelock==3.16.1 frozenlist==1.5.0 fsspec==2024.9.0 gguf==0.10.0 h11==0.14.0 httpcore==1.0.7 httptools==0.6.4 httpx==0.27.2 huggingface-hub==0.26.2 hydra-core==1.3.2 hydra-submitit-launcher==1.2.0 idna==3.10 importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1726082825846/work inflect==7.3.1 interegular==0.3.3 ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1719845459717/work ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1729866374957/work jaraco.collections==5.1.0 jaraco.context==5.3.0 jaraco.functools==4.0.1 jaraco.text==3.12.1 jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1731317204262/work Jinja2==3.1.4 jiter==0.7.1 jiwer==3.0.5 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1726610684920/work jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1727163409502/work lark==1.2.2 llvmlite==0.43.0 lm-format-enforcer==0.10.9 MarkupSafe==3.0.2 matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work mistral_common==1.5.0 more-itertools==10.3.0 mpmath==1.3.0 msgspec==0.18.6 multidict==6.1.0 multiprocess==0.70.16 nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work networkx==3.4.2 ninja==1.11.1.1 numba==0.60.0 numpy==1.26.4 omegaconf==2.3.0 openai==1.54.5 opencv-python-headless==4.10.0.84 outlines==0.0.46 packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1731802491770/work pandas==2.2.3 parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work partial-json-parser==0.2.1.1.post4 pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work pillow==10.4.0 platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1726613481435/work prometheus-fastapi-instrumentator==7.0.0 prometheus_client==0.21.0 prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1727341649933/work propcache==0.2.0 protobuf==5.28.3 psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1729847057810/work ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1721585709575/work py-cpuinfo==9.0.0 pyairports==2.1.1 pyarrow==18.0.0 pycountry==24.6.1 pydantic==2.9.2 pydantic_core==2.23.4 pydot==3.0.2 Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work pyparsing==3.2.0 python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1731919281354/work python-dotenv==1.0.1 pytz==2024.2 PyYAML==6.0.2 pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1728642254015/work RapidFuzz==3.10.1 referencing==0.35.1 regex==2024.11.6 requests==2.32.3 rpds-py==0.21.0 safetensors==0.4.5 sentencepiece==0.2.0 setuptools==75.5.0 setuptools-scm==8.1.0 six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work sniffio==1.3.1 stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work starlette==0.41.3 submitit==1.5.2 sympy==1.13.1 tiktoken==0.7.0 tokenizers==0.20.3 tomli==2.0.1 torch==2.5.1+cpu torchvision==0.20.1+cpu tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1724956131631/work tqdm==4.67.0 traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work transformers==4.46.3 typeguard==4.3.0 typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work tzdata==2024.2 urllib3==2.2.3 uvicorn==0.32.0 uvloop==0.21.0 vllm==0.6.4.post2.dev67+g63f1fde2.cpu watchfiles==0.24.0 wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work websockets==14.1 wheel==0.45.0 xxhash==3.5.0 yarl==1.17.2 zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1731262100163/work ```

Context for the issue:

No response

PierreLepagnol avatar Nov 21 '24 11:11 PierreLepagnol

I'm not an expert but in the doc (https://dottxt-ai.github.io/outlines/latest/reference/models/vllm/) it's formally said :

This also works with generators built with generate.regex, generate.json, generate.cfg, generate.format and generate.choice.

PierreLepagnol avatar Nov 21 '24 14:11 PierreLepagnol

Getting the same error

Tonybodo avatar Dec 10 '24 11:12 Tonybodo

Facing the same issue. Any resolution expected on this soon?

TanmayParekh avatar Apr 10 '25 18:04 TanmayParekh

This is now available in Outlines v1. Here's the documentation for the model (renamed VLLMOffline as we also have VLLM for the online server mode)

RobinPicard avatar Jun 20 '25 10:06 RobinPicard