hank issues

Results 11 issues of


                                            hank

请问在使用MP模式分词的时候，如何能让不在词典中的英文单词不被分成单个字母？

您好，我在使用MP分词的时候，有英文单词不在词典中，通过分词之后英文单词被分成了单个字母，请问需要怎么修改才能保证不在词典中的英文单词不被切分成字母？例如输入的句子：你好，are you ok? MP模式分词结果为：你好， a r e y o u o k ? 先谢谢您的帮助：）

tp > 1 inference is very slow

@mrwyattii Use latest main branch and test model is llamav2-7b. When I use tp=4 to test a single sentence inference, it costs 267.98s, but when tp=1, it costs 7s to...

retrain results are poor

Great work thanks for sharing！！！ I used the fastchat code combined with the apibench/huggingface_train.json data and the llamav2-7b model to retrain to get a new model, but the inference result...

[Feature]: FlashAttention 3 support

As you know, flashatten3 promises 1.5x~ improvements Is there any plan for support? Thanks！ https://github.com/Dao-AILab/flash-attention/commit/7ef24848cf2f855077cef88fe122775b727dcd74

feature request

stale

[Bug] llama3.1-8b smoothquant error (use latest version: 5fa9436)

### System Info ### System Info GPU: NVIDIA A100 Driver Version: 545.23.08 CUDA: 12.3 versions: https://github.com/NVIDIA/TensorRT-LLM.git (5fa9436) (latest version) https://github.com/triton-inference-server/tensorrtllm_backend ( [a6aa8eb](https://github.com/triton-inference-server/tensorrtllm_backend/commit/a6aa8eb6ce9371521df166c480e10262cd9c0cf4)) ### Who can help? _No response_ ### Information...

bug

functionality issue

[Usage]: How to use AutoModelForSequenceClassification correctly

### Your current environment ```text nvidia A100 GPU vllm 0.6.0 ``` ### How would you like to use vllm I want to run inference of a AutoModelForSequenceClassification. I don't know...

usage

[Bug]: deepseek-r1 mutlti-node crash

### Your current environment The output of `python collect_env.py` ```text PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A OS:...

bug

[Bug] large concurrency service broken

## 🐛 Bug The service started based on Meta-Llama-3.1-70B-Instruct fp8 will crash when running a large concurrency. ## To Reproduce ### convert model refer this issue: #2982 ### start service...

bug

Training loss does not decrease

when I use the **ultra_chat 200k** data (without regenerating the assistant data from the target model) to train the llama3.1-8b-instruct model, the **training acc is only around 35%** and **loss...

[Bug] TP=16 or TP=32 Failed

### Checklist - [x] 1. I have searched related issues but cannot get the expected help. - [x] 2. The bug has not been fixed in the latest version. -...