vllm [v1] Support allowed_token

[v1] Support allowed_token_ids in v1 Sampler

Open houseroad opened this issue 1 week ago • 3 comments

Follow the implementation in vllm/entrypoints/openai/logits_processors.py.

The idea is straightforward, adding a [batch_size x vocab_size] mask tensor, and leverage a list of bools to determine whether to do the inplace masked fill.

[x] add test
[x] move some verification in _get_allowed_token_ids_logits_processor to SamplingParam.

Test with

pytest tests/v1/sample/test_sampler.py
pytest tests/v1/worker/test_gpu_input_batch.py

Feb 13 '25 07:02 houseroad

vllm vllm copied to clipboard

[v1] Support allowed_token_ids in v1 Sampler

vllm
vllm copied to clipboard