blacker

Results 7 comments of blacker

You can start the VLLM API interface service, which will have CPU and GPU utilization, for example Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs,...

just like this one? https://github.com/sgl-project/sglang/blob/main/docs/references/production_metrics.md

@zhaochenyang20 return this information in JSON? how can I contribute it?

Is this feature going anywhere? Can I be a part of it?

File "/python/sglang/srt/managers/tp_worker.py", line 142, in __init__ from python.sglang.srt.metrics.metrics_collector import SGLangMetricsCollector You should delete python.