paddler icon indicating copy to clipboard operation
paddler copied to clipboard

When will buffer queues be enabled?

Open bodybreaker opened this issue 1 year ago • 1 comments

I noticed that it states that requests can queue when all llama.cpp instances are busy. I was wondering if the queuing is done per llama.cpp server or per slot? I am currently trying to scale up from 1 to multiple llama.cpp servers and the paddler_requests_buffered metric is always 0.

bodybreaker avatar Aug 19 '24 06:08 bodybreaker

@bodybreaker I will check if those metrics work correctly and get back to you.

mcharytoniuk avatar Aug 20 '24 10:08 mcharytoniuk

@bodybreaker I have just released a new stable version of Paddler (v1.0.0) and changed the CLI framework, overall it underwent a total rewrite.

I think your issue should be solved now, if it still persists feel free to reopen (please check the README though, some flag names have changed).

mcharytoniuk avatar Nov 20 '24 20:11 mcharytoniuk