libin817927
libin817927
### 🐛 Describe the bug It still hasn't started successfully after 10 minutes. The log is as follows. [2025-05-28 09:41:06.824] [infini] [info] ServerConfig: service_port=12345, manage_port=8088, log_level='info', dev_name='mlx5_1', ib_port=1, link_type='Ethernet', prealloc_size=45,...
### 🐛 Describe the bug Now trying to deploy llama3.1-8b on L4 GPU, but the pods on ports 8080 and 8000 have not started successfully. pod logs as follows: INFO...
https://github.com/vllm-project/aibrix/blob/main/samples/kvcache/l1cache/vllm.yaml It seems that models need to be downloaded from storage to local pods. Can we directly download models from Hugging Face to pods like how we deploy large models...
Can you provide a detailed technical report on architecture, especially regarding the implementation architecture design of routing strategy, KV cache offloading, and autoscaling? In fact, it is still a bit...
How to deploy a Qwen3 or Llama3 model cluster based on Aibrix? I'm still at a loss even after following the instructions at https://aibrix.readthedocs.io/latest/getting_started/installation/installation.html. I'm looking forward to having a...
Does it support speculative decoding with a draft model that is not an ngram? If it is supported, how should the yaml be configured and is there any corresponding documentation?