DefTruth

Results 256 comments of DefTruth

@yucornetto hi~ would you like to review this PR?

maybe relate to https://github.com/vllm-project/vllm/pull/5207

@youkaichao close, seems the latest vllm (up to #5410) has fixed this problem. (TP0T 45ms v0.4.2 -> 39ms v0.5, eager mode) ```bash [I][2024-06-11 16:31:36][ 1/20][ 1/20 5%] session:0 turn:0 req:0...

> In tensorrt_llm_backend, when we launch several server by MPI with world_size > 1, only the rank 0 (main process) will recieve/return requests. Other ranks will skip this step and...

同样的问题,会偶发在这段卡主: ```bash File "/root/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 155, in send_to_device return tensor.to(device, non_blocking=non_blocking) ``` trace到是在accelerate send_to_device函数没有返回

@lvhan028 感觉这个问题是个大bug,vl的模型,我用着经常会遇到这个偶发卡住的问题。没有报错,就是hang住不返回。看着像是accelerate和lmdeploy的集合通信互相死锁了,因为请求是异步发出的,vit的推理和llm的推理实际上是流水线重叠的。trace的日志: ```bash -- Stack for thread 23201439544896 --- File "/usr/lib/python3.10/threading.py", line 973, in _bootstrap self._bootstrap_inner() File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(*self._args,...

可以考虑vision走gloo cpu通信?这样就不会和nccl后端冲突,也不会卡住 ---- 回复的原邮件 ---- | 发件人 | Lyu ***@***.***> | | 日期 | 2025年07月16日 18:27 | | 收件人 | ***@***.***> | | 抄送至 | ***@***.***>***@***.***> | | 主题 |...

是的,目前是只能采用这种实现。不过这种方式还是有点问题,就是对于传input emb,当你需要使用penalty的时候,transformers实际只会考虑output ids做惩罚,比如repetition_penalty。而trtllm通过传input ids做推理,实际上会考虑input ids+output ids一起做penalty, 从而会导致,在两边都用penalty的情况下,输出结果与trn的无法对齐。