mlc-llm
mlc-llm copied to clipboard
[Question] Any way to get the raw token output from the model?
I read through the documentation and it seems the model is compiled with a specific chat template.
My chat template is more dynamic, is there any way to get the raw token outputs after inference so I can implement my own logic?
For example, my "roles" in the chat template changes dynamically on each message. Additionally, there are specific logic such as guidance I would like to implement on token output.
i think this is what you're looking for:
https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_raw_text_generation.ipynb
we are moving towarda a fully OAI compatble API, which hopefully allows some customizations in systems . You can use LM chat template which is mostly raw