vLLM v1 alpha bugs
Describe the issue as clearly as possible:
vLLM v1 just dropped and I tried it with some existing Outlines code, it looks like adapt_tokenizer broke
Steps/code to reproduce the bug:
Lightly adapted repro (swap out the model as desired):
from pydantic import BaseModel
class FooModel(BaseModel):
answer: int
def main(messages=["Hello world"], guide=FooModel, model_size="7b", cot_prefill="<think>\n\n</think>\n"):
from outlines.models.vllm import adapt_tokenizer
from outlines.processors import JSONLogitsProcessor
from transformers import AutoTokenizer
model_name = f"casperhansen/deepseek-r1-distill-qwen-{model_size}-awq"
llm = LLM(model_name, enable_prefix_caching=True)
tokenizer = llm.get_tokenizer()
# Build the prompt with CoT markers
msg_list = [{"role": "user", "content": msg} for msg in messages]
prompt = (
tokenizer.apply_chat_template(
msg_list, tokenize=False, add_generation_prompt=True
)
+ f"{cot_prefill}"
)
# Configure processors
json_schema = json.dumps(guide.model_json_schema())
model_name = llm.llm_engine.model_config.model
outlines_tokenizer = adapt_tokenizer(AutoTokenizer.from_pretrained(model_name))
guided_processor = JSONLogitsProcessor(
schema=json_schema, tokenizer=outlines_tokenizer, whitespace_pattern=r" ?"
)
sampling_params = SamplingParams(
temperature=temperature,
top_p=top_p,
max_tokens=max_new_tokens,
)
# Generate output
output = llm.generate(prompt, sampling_params, use_tqdm=False)
main()
Expected result:
(working generation!)
Error message:
File "/home/louis/lab/r1/src/r1/silent_thought_vllm.py", line 115, in think
output = llm.generate(prompt, sampling_params, use_tqdm=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/utils.py", line 1021, in inner
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 454, in generate
self._validate_and_add_requests(
File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 1175, in _validate_and_add_requests
self._add_request(
File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 1193, in _add_request
self.llm_engine.add_request(
File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 163, in add_request
self.engine_core.add_request(engine_core_req)
File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 215, in add_request
self._send_input(EngineCoreRequestType.ADD, request)
File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 211, in _send_input
msg = (request_type.value, self.encoder.encode(request))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 7, in encode
return pickle.dumps(obj)
^^^^^^^^^^^^^^^^^
AttributeError: Can't get local object 'adapt_tokenizer.<locals>.convert_token_to_string'
Outlines/Python version information:
Version information
Context for the issue:
It just got announced in alpha, thought I should report :-)
That’s great news! TBH, I’ve struggled so far to reproduce a working outlines test environment from scratch, mainly due to issues with vllm... if upgrading our dependency resolves this as a side effect, that would be nice! https://github.com/dottxt-ai/outlines/pull/1389#issuecomment-2618457406
From the announcement, looks like we'll have to wait a bit:
V1 currently lacks support for log probs, prompt log probs sampling parameters, pipeline parallelism, structured decoding, speculative decoding, prometheus metrics, and LoRA. We are actively working to close this feature gap and add brand-new optimizations to the V1 engine.
Closing the issue as Outlines v1 has been released.