llmperf
llmperf copied to clipboard
Concurrency level is not handled properly
Currently, library makes it look as if it can support arbitrary concurrency levels, but in reality it actually doesn't:
- Each actor require 1 CPU (ie number of actual concurrent workers will be limited by the number of CPUs in your Ray cluster)
- We're not asserting number of actors produced (ie if Ray can't schedule all of them, user will never know)
- Tasks to query the target endpoint are spun up from a single loop, therefore bound by the performance of submitting tasks to Ray from a single loop