[Bug] Wrong tokens with mistral model
Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [X] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [X] 5. Please use English, otherwise it will be closed.
Describe the bug
Thanks for your great work.
I found that tokens return from sglang is wrong with mistral series:
The figure illustrates two different tokenization results. The tokens on the left side of the "||" symbol are obtained using the tokenizer.tokenize(text) method, while those on the right are returned by sglang. It's crucial to note that the tokens returned by sglang cannot be directly decoded back into the original string.
Why I want to decode with tokens: I want to incorporate stop tokens in the string but sglang does not support this function.
Reproduction
- launch sglang with mistral-7b-instruct
- sampling_params = {'temperature': 0, 'echo':True,'top_p': 1.0, 'best_of': 1, 'max_tokens': 1, 'n': 1, 'logprobs': 1}
- get the output[0].choices[0].logprobs.token
Environment
sglang 0.2.12
Have you tried the latest version 0.2.13? Why not use python3 -m sglang.check_env for env info?
check_env output is as follow: Python: 3.9.17 (main, Jul 5 2023, 20:41:20) [GCC 11.2.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB GPU 0,1,2,3,4,5,6,7 Compute Capability: 8.0 CUDA_HOME: /usr/local/cuda-11.8 NVCC: Cuda compilation tools, release 11.8, V11.8.89 CUDA Driver Version: 525.105.17 PyTorch: 2.3.1+cu118 flashinfer: 0.1.5+cu118torch2.3 triton: 2.3.1 transformers: 4.43.3 requests: 2.32.3 tqdm: 4.66.4 numpy: 1.26.3 aiohttp: 3.8.5 fastapi: 0.110.0 hf_transfer: Module Not Found huggingface_hub: 0.24.3 interegular: 0.3.3 packaging: 24.1 PIL: 10.2.0 psutil: 6.0.0 pydantic: 2.5.0 uvicorn: 0.23.2 uvloop: 0.19.0 zmq: 24.0.1 vllm: 0.5.3 multipart: 0.0.6 openai: 1.14.2 anthropic: Module Not Found litellm: Module Not Found NVIDIA Topology: GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 CPU Affinity NUMA Affinity GPU0 X NV12 NV12 NV12 NV12 NV12 NV12 NV12 NODE 0-31,64-95 0 GPU1 NV12 X NV12 NV12 NV12 NV12 NV12 NV12 NODE 0-31,64-95 0 GPU2 NV12 NV12 X NV12 NV12 NV12 NV12 NV12 PXB 0-31,64-95 0 GPU3 NV12 NV12 NV12 X NV12 NV12 NV12 NV12 PXB 0-31,64-95 0 GPU4 NV12 NV12 NV12 NV12 X NV12 NV12 NV12 SYS 32-63,96-127 1 GPU5 NV12 NV12 NV12 NV12 NV12 X NV12 NV12 SYS 32-63,96-127 1 GPU6 NV12 NV12 NV12 NV12 NV12 NV12 X NV12 SYS 32-63,96-127 1 GPU7 NV12 NV12 NV12 NV12 NV12 NV12 NV12 X SYS 32-63,96-127 1 NIC0 NODE NODE PXB PXB SYS SYS SYS SYS X
Legend:
X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
ulimit soft: 1048576
Yes, I have tried the latest version now, but the error still exists.
Why is there no version information for sglang in your check_env?
I installed sglang from source. The version is 0.2.13.
@StevenZHB Can you submit a fix for this? It should be easy. The related files are https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/managers/tokenizer_manager.py and https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/managers/detokenizer_manager.py