Jiarong Xing comments

Results 29 comments of


                                            Jiarong Xing

Error on gpt-oss with vLLM

Hi @mcr-ksh, we haven't implement kvcached for more than one KV cache type. If this is needed, we would happy to support this. @ivanium @shanyu-sys

Output is corrupted after around 2000 tokens

Hi @cduk Thanks for reporting this problem. Let us dig in more. Stay tuned!

Dynamic Memory Management incurs error when running 4 instances on 2 * A100 GPUs

Hi @Ltryxcy, thanks for using kvcached. This seems to be a bug. @ivanium do you have any ideas? > ERROR: /tmp/pip-install-y6ja0vyp/kvcached_32ecdf44025a4bd5b524cceff3832c20/csrc/ftensor.cpp:75: Page 112 is already mapped. ERROR: /tmp/pip-install-y6ja0vyp/kvcached_32ecdf44025a4bd5b524cceff3832c20/csrc/ftensor.cpp:75: Page 745...

Running Llama3.1 + Llama3.3 70bs on 8 x A100s

Hi @Nujjy, Thanks for using kvcached! I did a simple calculation, and it seems that this behavior could be expected. Both models are 70B, so for each of them, when...

KVCached support for gpt-oss-20b attention type not supported in SGLang

Hi @deepak-vij, The attention type needed for gpt-oss hasn't been supported yet. We are working on it. Stay tuned!

[FEAT] Any plans to support TensortRT-LLM?

Hi @troycheng, we definitely want to support as many inference enignes as possible, including TRTLLM-Serve. We are currently a bit back of hands on supporting different features. Would you be...

[FEAT] Any plans to support TensortRT-LLM?

Thank you very much! We'd happy to provide any needed help. Just let us know.

Two Qwen3-32B-FP8 instances on an H20 96G GPU using vLLM fails to process requests

@shpgy-shpgy Thanks for providing the detailed `nvidia-smi` information. Yeah, as I can see from the screenshots, the GPU is almost running out of memory. When one instance is processing requests,...

[TODO] Ollama integeration

> Hi [@jiarong0907](https://github.com/jiarong0907), I would like to take on the Ollama integration issue, and I think it’ll be a great way for me to get more familiar with the codebase....

Using kvcached + sglang + qwen-fp8 directly causes an out-of-bounds error. [bug]

B200 is a bit different from H series. Will get back soon.