blacker
blacker
You can start the VLLM API interface service, which will have CPU and GPU utilization, for example Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs,...
I meet the same problem also.
I will take that together
just like this one? https://github.com/sgl-project/sglang/blob/main/docs/references/production_metrics.md
@zhaochenyang20 return this information in JSON? how can I contribute it?
Is this feature going anywhere? Can I be a part of it?
File "/python/sglang/srt/managers/tp_worker.py", line 142, in __init__ from python.sglang.srt.metrics.metrics_collector import SGLangMetricsCollector You should delete python.