libin817927 issues

Results 6 issues of


                                            libin817927

kvcache cluster cannot start

### 🐛 Describe the bug It still hasn't started successfully after 10 minutes. The log is as follows. [2025-05-28 09:41:06.824] [infini] [info] ServerConfig: service_port=12345, manage_port=8088, log_level='info', dev_name='mlx5_1', ib_port=1, link_type='Ethernet', prealloc_size=45,...

kind/support

area/distributed

area/kv-cache

deployment of l1 kv cache using YAML has failed

### 🐛 Describe the bug Now trying to deploy llama3.1-8b on L4 GPU, but the pods on ports 8080 and 8000 have not started successfully. pod logs as follows： INFO...

area/kv-cache

Support direct download of models from Hugging Face for deployment

https://github.com/vllm-project/aibrix/blob/main/samples/kvcache/l1cache/vllm.yaml It seems that models need to be downloaded from storage to local pods. Can we directly download models from Hugging Face to pods like how we deploy large models...

a detailed technical report on architecture

Can you provide a detailed technical report on architecture, especially regarding the implementation architecture design of routing strategy, KV cache offloading, and autoscaling? In fact, it is still a bit...

triage/needs-information

area/community

How to deploy a scalable cluster based on Aibrix

How to deploy a Qwen3 or Llama3 model cluster based on Aibrix? I'm still at a loss even after following the instructions at https://aibrix.readthedocs.io/latest/getting_started/installation/installation.html. I'm looking forward to having a...

Does it support speculative decoding with a draft model that is not an ngram?

Does it support speculative decoding with a draft model that is not an ngram? If it is supported, how should the yaml be configured and is there any corresponding documentation?