llmperf
llmperf copied to clipboard

Published 20 hours ago •

Reame
Issues

Concurrency level is not handled properly

Open alexeykudinkin opened this issue 1 year ago • 0 comments

Currently, library makes it look as if it can support arbitrary concurrency levels, but in reality it actually doesn't:

Each actor require 1 CPU (ie number of actual concurrent workers will be limited by the number of CPUs in your Ray cluster)
We're not asserting number of actors produced (ie if Ray can't schedule all of them, user will never know)
Tasks to query the target endpoint are spun up from a single loop, therefore bound by the performance of submitting tasks to Ray from a single loop

Jan 26 '24 02:01 alexeykudinkin