outlines icon indicating copy to clipboard operation
outlines copied to clipboard

vLLM v1 alpha bugs

Open lmmx opened this issue 10 months ago • 2 comments

Describe the issue as clearly as possible:

vLLM v1 just dropped and I tried it with some existing Outlines code, it looks like adapt_tokenizer broke

Steps/code to reproduce the bug:

Lightly adapted repro (swap out the model as desired):


from pydantic import BaseModel

class FooModel(BaseModel):
    answer: int

def main(messages=["Hello world"], guide=FooModel, model_size="7b", cot_prefill="<think>\n\n</think>\n"):
    from outlines.models.vllm import adapt_tokenizer
    from outlines.processors import JSONLogitsProcessor
    from transformers import AutoTokenizer

    model_name = f"casperhansen/deepseek-r1-distill-qwen-{model_size}-awq"
    llm = LLM(model_name, enable_prefix_caching=True)
    tokenizer = llm.get_tokenizer()

    # Build the prompt with CoT markers
    msg_list = [{"role": "user", "content": msg} for msg in messages]
    prompt = (
        tokenizer.apply_chat_template(
            msg_list, tokenize=False, add_generation_prompt=True
        )
        + f"{cot_prefill}"
    )

    # Configure processors
    json_schema = json.dumps(guide.model_json_schema())
    model_name = llm.llm_engine.model_config.model
    outlines_tokenizer = adapt_tokenizer(AutoTokenizer.from_pretrained(model_name))
    guided_processor = JSONLogitsProcessor(
        schema=json_schema, tokenizer=outlines_tokenizer, whitespace_pattern=r" ?"
    )
    sampling_params = SamplingParams(
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_new_tokens,
    )
    # Generate output
    output = llm.generate(prompt, sampling_params, use_tqdm=False)

main()

Expected result:

(working generation!)

Error message:

File "/home/louis/lab/r1/src/r1/silent_thought_vllm.py", line 115, in think                                   
    output = llm.generate(prompt, sampling_params, use_tqdm=False)                                                                                    
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^             
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/utils.py", line 1021, in inner                                                              
    return fn(*args, **kwargs)                                                 
           ^^^^^^^^^^^^^^^^^^^                                                 
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 454, in generate                                                  
    self._validate_and_add_requests(                                           
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 1175, in _validate_and_add_requests                               
    self._add_request(                                                         
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 1193, in _add_request                                             
    self.llm_engine.add_request(                                               
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 163, in add_request                                          
    self.engine_core.add_request(engine_core_req)                              
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 215, in add_request                                         
    self._send_input(EngineCoreRequestType.ADD, request)                       
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 211, in _send_input                                         
    msg = (request_type.value, self.encoder.encode(request))                   
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                    
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 7, in encode                                                      
    return pickle.dumps(obj)                                                   
           ^^^^^^^^^^^^^^^^^                                                   
AttributeError: Can't get local object 'adapt_tokenizer.<locals>.convert_token_to_string'

Outlines/Python version information:

Version information

``` 0.1.11 Python 3.12.8 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:31:09) [GCC 11.2.0] accelerate==1.3.0 aiohappyeyeballs==2.4.4 aiohttp==3.11.11 aiohttp-cors==0.7.0 aiosignal==1.3.2 airportsdata==20241001 annotated-types==0.7.0 anyio==4.8.0 argh==0.31.3 astor==0.8.1 attrs==24.3.0 bitsandbytes==0.45.0 blake3==1.0.2 cachetools==5.5.0 certifi==2024.12.14 charset-normalizer==3.4.1 click==8.1.8 cloudpickle==3.1.1 colorful==0.5.6 compressed-tensors==0.8.1 datasets==3.2.0 depyf==0.18.0 dill==0.3.8 diskcache==5.6.3 distlib==0.3.9 distro==1.9.0 einops==0.8.0 fastapi==0.115.6 filelock==3.16.1 frozenlist==1.5.0 fsspec==2024.9.0 gguf==0.10.0 google-api-core==2.24.0 google-auth==2.37.0 googleapis-common-protos==1.66.0 grpcio==1.69.0 h11==0.14.0 httpcore==1.0.7 httptools==0.6.4 httpx==0.28.1 huggingface-hub==0.27.1 idna==3.10 importlib-metadata==8.6.1 iniconfig==2.0.0 interegular==0.3.3 jinja2==3.1.5 jiter==0.8.2 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 lark==1.2.2 linkify-it-py==2.0.3 lm-format-enforcer==0.10.9 markdown-it-py==3.0.0 markupsafe==3.0.2 mdit-py-plugins==0.4.2 mdurl==0.1.2 memray==1.15.0 mistral-common==1.5.1 mpmath==1.3.0 msgpack==1.1.0 msgspec==0.19.0 multidict==6.1.0 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.4.2 numpy==1.26.4 nvidia-cublas-cu12==12.4.5.8 nvidia-cuda-cupti-cu12==12.4.127 nvidia-cuda-nvrtc-cu12==12.4.127 nvidia-cuda-runtime-cu12==12.4.127 nvidia-cudnn-cu12==9.1.0.70 nvidia-cufft-cu12==11.2.1.3 nvidia-curand-cu12==10.3.5.147 nvidia-cusolver-cu12==11.6.1.9 nvidia-cusparse-cu12==12.3.1.170 nvidia-ml-py==12.560.30 nvidia-nccl-cu12==2.21.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.4.127 openai==1.59.9 opencensus==0.11.4 opencensus-context==0.1.3 opencv-python-headless==4.11.0.86 outlines==0.1.11 outlines-core==0.1.26 packaging==24.2 pandas==2.2.3 partial-json-parser==0.2.1.1.post5 pillow==10.4.0 platformdirs==4.3.6 pluggy==1.5.0 prometheus-client==0.21.1 prometheus-fastapi-instrumentator==7.0.2 propcache==0.2.1 proto-plus==1.25.0 protobuf==5.29.3 psutil==6.1.1 py-cpuinfo==9.0.0 py-spy==0.4.0 pyarrow==19.0.0 pyasn1==0.6.1 pyasn1-modules==0.4.1 pybind11==2.13.6 pycountry==24.6.1 pydantic==2.10.5 pydantic-core==2.27.2 pygments==2.19.1 pytest==8.3.4 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 pyyaml==6.0.2 pyzmq==26.2.0 -e file:///home/louis/lab/r1 ray==2.40.0 referencing==0.36.1 regex==2024.11.6 requests==2.32.3 rich==13.9.4 rpds-py==0.22.3 rsa==4.9 safetensors==0.5.2 sentencepiece==0.2.0 setuptools==75.8.0 six==1.17.0 smart-open==7.1.0 sniffio==1.3.1 starlette==0.41.3 sympy==1.13.1 textual==1.0.0 tiktoken==0.7.0 tokenizers==0.21.0 torch==2.5.1 torchvision==0.20.1 tqdm==4.67.1 transformers==4.48.0 triton==3.1.0 typing-extensions==4.12.2 tzdata==2025.1 uc-micro-py==1.0.3 urllib3==2.3.0 uvicorn==0.34.0 uvloop==0.21.0 virtualenv==20.29.1 vllm==0.6.6.post1 watchfiles==1.0.4 websockets==14.2 wrapt==1.17.2 xformers==0.0.28.post3 xgrammar==0.1.10 xxhash==3.5.0 yarl==1.18.3 zipp==3.21.0 ```

Context for the issue:

It just got announced in alpha, thought I should report :-)

lmmx avatar Jan 27 '25 21:01 lmmx

That’s great news! TBH, I’ve struggled so far to reproduce a working outlines test environment from scratch, mainly due to issues with vllm... if upgrading our dependency resolves this as a side effect, that would be nice! https://github.com/dottxt-ai/outlines/pull/1389#issuecomment-2618457406

yvan-sraka avatar Jan 28 '25 18:01 yvan-sraka

From the announcement, looks like we'll have to wait a bit:

V1 currently lacks support for log probs, prompt log probs sampling parameters, pipeline parallelism, structured decoding, speculative decoding, prometheus metrics, and LoRA. We are actively working to close this feature gap and add brand-new optimizations to the V1 engine.

zkalson avatar Mar 13 '25 00:03 zkalson

Closing the issue as Outlines v1 has been released.

RobinPicard avatar Jun 20 '25 10:06 RobinPicard