aibrix
aibrix copied to clipboard
engine prefix should be provided by user and we should not amend `vllm:`
🐛 Describe the bug
W0930 18:13:41.031286 1 fetcher.go:99] Failed to fetch metric vllm:gpu_cache_usage_perc from pod default/mock-llama2-7b-7cc98b7f5f-764t4: metric vllm:gpu_cache_usage_perc not found in central registry. Returning zero value.
Steps to Reproduce
metricsSources:
- metricSourceType: pod
protocolType: http
port: "8000"
path: metrics
targetMetric: "avg_prompt_throughput_toks_per_s" # change it to `vllm:avg_prompt_throughput_toks_per_s`
targetValue: "60"
scalingStrategy: "KPA"
Expected behavior
it should work
Environment
nightly
We should use implicitly append vllm:, this is really confusing, we should ask user to type the full name to avoid the conversion
I use YAML as below to be working. I'm using the mock vllm app, so I don't need to set the "vllm:" prefix, and it works as expected. Could you please explain more detail? 🤔
➜ ~ curl http://127.0.0.1:8000/metrics | grep gpu
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 24185 100 24185 0 0 1256k 0 --:--:-- --:--:-- --:--:-- 1312k
# HELP vllm:gpu_cache_usage_perc GPU KV-cache usage. 1 means 100 percent usage.
# TYPE vllm:gpu_cache_usage_perc gauge
vllm:gpu_cache_usage_perc{model_name="llama2-7b"} 0.0
vllm:gpu_cache_usage_perc{model_name="text2sql-lora-2"} 0.0
apiVersion: autoscaling.aibrix.ai/v1alpha1
kind: PodAutoscaler
metadata:
name: mock-llama2-7b-hpa
namespace: default
labels:
app.kubernetes.io/name: aibrix
app.kubernetes.io/managed-by: kustomize
spec:
scalingStrategy: APA
minReplicas: 1
maxReplicas: 2
metricsSources:
- metricSourceType: pod
protocolType: http
port: '8000'
path: /metrics
targetMetric: gpu_cache_usage_perc
targetValue: '50'
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mock-llama2-7b