q yao comments

Results 318 comments of


                                            q yao

[Bug] RuntimeError: CUDA error: operation not permitted when stream is capturing

> I'm curious to know if there's any plan to bring TurboMind support to smaller models like Intervl2-1b @lvhan028 @lzhangzz

[Bug] use openai server, request get asyncio.exceptions.TimeoutError

> File "/opt/conda/lib/python3.8/site-packages/triton/compiler/backends/cuda.py", line 173, in make_llir ret = translate_triton_gpu_to_llvmir(src, capability, tma_infos, runtime.TARGET.NVVM) The triton kernel compilation failed on your device. What is your triton version?

dert_head.py bug with mmdet2trt version 0.6.0

Thanks for the report. I will fix it soon.

[Bug] PyTorchEngine 性能分析没有保存文件

lmdeploy: 0.7.3+ is an old version, please upgrade and try again. And you have not explicitly set the PyTorch engine backend. Some models might dispatch to Turbomind.

[Bug] PyTorchEngine 性能分析没有保存文件

`--backend pytorch` Turbomind has better performance.

[Bug] fill_kv_cache 实现有bug

这个 block_size 是 paged attention 中的 block 大小，是引擎的一个配置参数 https://github.com/InternLM/lmdeploy/blob/5f0647f1181312975f05d16eeb166d5a69afb6ef/lmdeploy/messages.py#L342 通常是要求必须是2的指数次的，如果不这么做，那么 fill_kv_cache / paged_attention 等很多模块/kernel都会受到影响，对性能没什么好处（要在 kernel 中加更多边界检查；attention 中的 tensorcore 使用也会更复杂）。因此这里对 block size 其实是有隐式的假设的，也许在启动引擎的检查中就应该加个断言？

[Refactor] Support custom codebase and task

> Does this mean that only modules that formed in package and can be imported by `importlib` were supported? In many cases, users only write a model in a script...

[Bug] Object creation will fail if spilling is required.

`ray` would store the logs on the file system. You can clear your disk or ignore the warning.

[Bug] 使用lora适配器参数起服务后，只有一个模型可用，基础模型不可用

> 使用lmdeploy教程中的指令起服务 > lmdeploy serve api_server root/workspace/personal_data/LLM_models/Qwen3-8B --adapters > mylora=/root/workspace/personal_data/lora_model则会报错，报错内容为lmdeploy serve api_server: error: the following arguments are required: model_path 我用类似的方法可以起，你是不是路径里漏了 `/`？客户端可以用 model=mylora 字段来选择激活的 adapter的

Support Deepseek v32

> error: Try build wheel on device with network available.