When will buffer queues be enabled?
I noticed that it states that requests can queue when all llama.cpp instances are busy. I was wondering if the queuing is done per llama.cpp server or per slot? I am currently trying to scale up from 1 to multiple llama.cpp servers and the paddler_requests_buffered metric is always 0.
@bodybreaker I will check if those metrics work correctly and get back to you.
@bodybreaker I have just released a new stable version of Paddler (v1.0.0) and changed the CLI framework, overall it underwent a total rewrite.
I think your issue should be solved now, if it still persists feel free to reopen (please check the README though, some flag names have changed).