q yao comments

Results 318 comments of


                                            q yao

Import ERROR

Update tensorrt version or just remove bicubic resize model in `torch2trt_dynamic/converters/grid_sample.py`.

[Feature] Support SOLO for dev-1.x

I can not test solo on my device. OOM.

[Bug] Qwen2-VL on NVIDIA L20 fails with Triton shared memory OutOfResources error

https://github.com/InternLM/lmdeploy/blob/967df47f574056740cb45b52338563373730c144/lmdeploy/pytorch/kernels/cuda/flashattention.py#L498 try manually tunning these arguements.

[Bug] Qwen2-VL on NVIDIA L20 fails with Triton shared memory OutOfResources error

@cuikaiGitHub 0.8.0 is a quite old version, try switch to our latest release. If latest release still does not works. Manually tuning values above might works. num_stages is an int...

Support pytorch engine kv int4/int8 quantization

Since kv int4 requires triton>=2.3.0, It would be cool if we add a check in engine. https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/pytorch/check_env/__init__.py

Support pytorch engine kv int4/int8 quantization

`quant_policy` might not be a good argument name since user might misunderstand this as online quant(gemm)

Encountered known unsupported method torch.nn.functional.has_torch_function_variadic

We have not support all ops in pytorch yet.

Support Oriented RepPoints in ONNX format

[This](https://github.com/open-mmlab/mmdeploy/blob/master/docs/en/07-developer-guide/support_new_model.md) is the tutorial about how to support new models. We have place all rewriter for MMRotate in https://github.com/open-mmlab/mmdeploy/tree/master/mmdeploy/codebase/mmrotate/models . Try add the rewriter to support the model you want.

[Bug] api 流式批处理bug 只能得到一个结果，然后卡死。不得不超时强制取消

可以试试看这个 ```python import openai from typing import List import asyncio async def chat_single(client: openai.AsyncOpenAI, prompt: str, model_name: str): """chat single async""" response = await client.chat.completions.create( model=model_name, messages=[{"role": "user", "content": prompt}],...

[Bug] api 流式批处理bug 只能得到一个结果，然后卡死。不得不超时强制取消

还有， distill 的 r1 turbomind 应该都支持的，awq 的性能也是 turbomind 更好，如果追求性能更推荐用 turbomind。