blacker
Results
2
comments of
blacker
You can start the VLLM API interface service, which will have CPU and GPU utilization, for example Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs,...
I meet the same problem also.