llmperf icon indicating copy to clipboard operation
llmperf copied to clipboard

Concurrency level is not handled properly

Open alexeykudinkin opened this issue 1 year ago • 0 comments

Currently, library makes it look as if it can support arbitrary concurrency levels, but in reality it actually doesn't:

  • Each actor require 1 CPU (ie number of actual concurrent workers will be limited by the number of CPUs in your Ray cluster)
  • We're not asserting number of actors produced (ie if Ray can't schedule all of them, user will never know)
  • Tasks to query the target endpoint are spun up from a single loop, therefore bound by the performance of submitting tasks to Ray from a single loop

alexeykudinkin avatar Jan 26 '24 02:01 alexeykudinkin