Specify specific token_id in regex or grammar

Open g-eoj opened this issue 7 months ago • 1 comments

For special tokens (such as </think> for reasoning models), is there a way to explicitly request to match the token ID for the special token, rather than hope the combination of the matcher, tokenizer, etc. end up on the right ID? An of example of a potential issue is the </think> string in a regex possibly getting broken up into multiple token IDs, which will have a different meaning to the model.

May 08 '25 20:05 g-eoj

Thanks for the feature request! This is planned in later versions. Please stay tuned!

Jun 30 '25 06:06 Ubospica