en/posts/kubecon-china-2025/

Open utterances-bot opened this issue 4 months ago • 2 comments

KubeCon China 2025 Experience - This Cute World

https://thiscute.world/en/posts/kubecon-china-2025/

Aug 25 '25 22:08 utterances-bot

LLM Scaling and Load Balancing

I haven't saw the talks yet, but it is an interesting topic. Have you saw Paddler before? It is a AI app builder and load balancer to host LLMs locally. It uses llama.cpp under the hood and proxies requests based on processing slots. It also has HA/service discovery/recovery with and agent which supervises and change model and parameters at real time. Nowdays i see that as the base to create conversational apps and its very promising for infra as it can scale from zero nodes because it has requests buffering. Im an avid contributor there and you can reach out there in the community or even ask me if you have some doubt about the project and experiment it. I would be happy to help.

Aug 25 '25 22:08 Propfend

@Propfend Thanks for the recommendation—it’s interesting. I don’t host any LLM infrastructure in-house yet, so I haven’t had a chance to try things like this.

Aug 29 '25 06:08 ryan4yin