inforly comments

Results 8 comments of


                                            inforly

rateLimit not working as expected for long latency requests

Found it can't work with the requests even the latency about tens of milliseconds.

rateLimit not working as expected for long latency requests

> Hey @inforly, > > Thanks for the feedback, I closed the issue accordingly. @nmengin why did you close this? Is this a problem? Could you give solution to such...

rateLimit not working as expected for long latency requests

> Hello @inforly, there has been a misunderstanding I suppose. > > Could you provide a minimum reproducible test case? This would be very helpful. @jspdown yes, for example, run...

rateLimit not working as expected for long latency requests

> Hello @inforly, > > I think @jspdown was referring to a reproduction case that would allow us to confirm the problem, not how to simulate a long latency service....

Request timeouts reported as ForwarderError.RequestCanceled instead of RequestTimedOut

> When using request timeouts (timeout configuration on the route), that timeout works by signaling the `HttpContext.RequestAborted` `CancellationToken`. > > YARP currently assumes that token getting cancelled is the result...

start mutiple models

@ivanium @jiarong0907 thank you very much for the quick fix! I tried running the following script without using --gpu-memory-utilization 0.4, but it still failed with a CUDA out-of-memory error. script...

start mutiple models

Thanks, @jiarong0907 ! It started successfully this time. I tried with the `export KVCACHED_AUTOPATCH=1`, but looks no performance gain. Could you please take a look? script ``` apt-get update &&...

start mutiple models

@ivanium thanks a lot for the detailed explanation! Actually, without kvcached, we couldn't even start the 2 instances successfully each time! I followed your suggestion to do the test, here...