wentao
wentao
### Model Series Qwen3 ### What are the models used? qwen3-4b ### What is the scenario where the problem happened? performance qwen3-4b vs qwen2.5-7b-instruct ### Is this a known issue?...
### Required prerequisites - [x] I have read the documentation . - [x] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/align-anything/issues) and [Discussions](https://github.com/PKU-Alignment/align-anything/discussions) that this hasn't already been reported. (+1 or comment...
**Describe the bug** xp1d not working properly! **Configuration Information** NVIDIA DEVICE 4090X24GB VLLM 0.9.0.1 lmcache 0.3.1.dev12 **Test Command** ``` # p1 UCX_TLS=cuda_ipc,cuda_copy,tcp \ LMCACHE_CONFIG_FILE=/3rdparty/LMCache/examples/disagg_prefill/xp1d/configs/lmcache-prefiller-config.yaml \ VLLM_ENABLE_V1_MULTIPROCESSING=1 \ VLLM_WORKER_MULTIPROC_METHOD=spawn \ CUDA_VISIBLE_DEVICES=3...
I am currently testing the write RDMA bandwidth of the IB network card. I used two methods to test it: The first: ``` #server watch ib_write_bw -d mlx5_0 -q 1...
We are currently testing the performance of large language models, involving benchmarks at different concurrency levels/qps, as follows: ``` ============ Serving Benchmark Result ============ Backend: sglang Traffic request rate: inf...
**describe:** Currently I'm getting the following error when using disk caching: ``` [32;20m[2025-09-24 07:10:25,904] LMCache INFO:[0m Reqid: chatcmpl-7b3868e5f218403781044c51450ded2c, Total tokens 2774, LMCache hit tokens: 2560, need to load: 2560 [3m(vllm_v1_adapter.py:739:lmcache.integration.vllm.vllm_v1_adapter)[0m...
**问题描述:** 目前使用paddle框架训练repvgg分类模型,然后进行转onnx,之后在进行转caffe,但是发现转caffe的时候,遇到了op不支持的问题。 同样使用mmcls-torch框架训练resnet18分类模型,然后进行转哦你那些,之后进行转caffe,发现转caffe是可以的。 比对了,两个分类模型onnx拓扑图,发现均存在gemm操作。具体如下图:   考虑到caffe框架比较老,目前想通过对paddle-repvgg的gemm操作做修改来实现caffe模型的转换。 请问是否有好的建议或方法?谢谢!
### Your current environment ``` Package Version Editable project location --------------------------------- ------------- ------------------------------------------------- accelerate 1.12.0 aiofile 3.9.0 aiofiles 24.1.0 aiohappyeyeballs 2.6.1 aiohttp 3.12.15 aiosignal 1.4.0 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio...
### 🚀 The feature, motivation and pitch Are there any plans to support CPU offloading for KV cache? Currently, we've observed that multimodal KV cache consumes significant resources. For example,...
### 🚀 The feature, motivation and pitch Supports inference with the cpatonn/Qwen3-Omni-30B-A3B-Instruct-AWQ-4bit model. ### Alternatives _No response_ ### Additional context _No response_ ### Before submitting a new issue... - [x]...