blacker comments

Results 7 comments of


                                            blacker

[Performance]: Qwen 7b chat model, under 128 concurrency, the CPU utilization rate is 100%, and the GPU SM utilization rate is only about 60%-75%. Is it a CPU bottleneck?

You can start the VLLM API interface service, which will have CPU and GPU utilization, for example Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs,...

unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'name' not found

I meet the same problem also.

[Feature] grafana dashboard should work out of the box

I will take that together

[Feature] grafana dashboard should work out of the box

just like this one? https://github.com/sgl-project/sglang/blob/main/docs/references/production_metrics.md

[Feature] grafana dashboard should work out of the box

@zhaochenyang20 return this information in JSON? how can I contribute it?

[Feature] grafana dashboard should work out of the box

Is this feature going anywhere? Can I be a part of it?

[WIP] Prometheus Metrics

File "/python/sglang/srt/managers/tp_worker.py", line 142, in __init__ from python.sglang.srt.metrics.metrics_collector import SGLangMetricsCollector You should delete python.