vllm
vllm copied to clipboard
[v1] Support allowed_token_ids in v1 Sampler
Follow the implementation in vllm/entrypoints/openai/logits_processors.py.
The idea is straightforward, adding a [batch_size x vocab_size] mask tensor, and leverage a list of bools to determine whether to do the inplace masked fill.
- [x] add test
- [x] move some verification in _get_allowed_token_ids_logits_processor to SamplingParam.
Test with
- pytest tests/v1/sample/test_sampler.py
- pytest tests/v1/worker/test_gpu_input_batch.py