Jiarong Xing

Results 29 comments of Jiarong Xing

Hi @mcr-ksh, we haven't implement kvcached for more than one KV cache type. If this is needed, we would happy to support this. @ivanium @shanyu-sys

Hi @cduk Thanks for reporting this problem. Let us dig in more. Stay tuned!

Hi @Ltryxcy, thanks for using kvcached. This seems to be a bug. @ivanium do you have any ideas? > ERROR: /tmp/pip-install-y6ja0vyp/kvcached_32ecdf44025a4bd5b524cceff3832c20/csrc/ftensor.cpp:75: Page 112 is already mapped. ERROR: /tmp/pip-install-y6ja0vyp/kvcached_32ecdf44025a4bd5b524cceff3832c20/csrc/ftensor.cpp:75: Page 745...

Hi @Nujjy, Thanks for using kvcached! I did a simple calculation, and it seems that this behavior could be expected. Both models are 70B, so for each of them, when...

Hi @deepak-vij, The attention type needed for gpt-oss hasn't been supported yet. We are working on it. Stay tuned!

Hi @troycheng, we definitely want to support as many inference enignes as possible, including TRTLLM-Serve. We are currently a bit back of hands on supporting different features. Would you be...

Thank you very much! We'd happy to provide any needed help. Just let us know.

@shpgy-shpgy Thanks for providing the detailed `nvidia-smi` information. Yeah, as I can see from the screenshots, the GPU is almost running out of memory. When one instance is processing requests,...

> Hi [@jiarong0907](https://github.com/jiarong0907), I would like to take on the Ollama integration issue, and I think it’ll be a great way for me to get more familiar with the codebase....

B200 is a bit different from H series. Will get back soon.