andreapiso comments

Results 40 comments of


                                            andreapiso

[RFC][Serve] Multi Applications in serve 2.x API

@sihanwang41 what is the status of Multi Apps in Ray Serve? The documentation still shows only the `serve.run` method with centralised YAML file which I am not sure whether it...

OpenAI timeout

@darinkishore what stack trace do you need? The program does not crash, it hangs because openAI does not response until it prints that it's retrying with exponential backoff because api.openai.com...

OpenAI timeout

I tested it today with 1106 and the OpenAI API still gets stuck fairly often, which makes dspy take a almost two hours to complete a generation of 8 candidate...

OpenAI timeout

In particular, I am starting to get: ``` Error for example in dev set: HTTP code 502 from API ( 502 Bad Gateway 502 Bad Gateway cloudflare ) ``` I...

OpenAI timeout

> Can you help me understand the issue? I run large experiments with OpenAI (the latest turbo model at all times) and, except when they note on https://status.openai.com/ that there...

OpenAI timeout

It happens consistently with single thread too, which is actually the worst case scenario, because at least with multiple requests in parallel other requests in the optimisation set can stil...

OpenAI timeout

I am definitely nowhere near hitting the rate limit, this happens even just sending 20-30 calls over the span of the optimisation process. ``` Basically, with a single thread, during...

added support for vLLM server and docs for local models

Any news on the state of this PR? Support for vLLM would be super.

Support PagedAttention

> PagedAttention seems to be nicer with respect to VRAM usage meaning it's better when you're low on VRAM. This mostly affects throughput at those regimes, not latency. > Correct,...

Support PagedAttention

Yes, tgi is what we are using today and it's the best we have so far :) we were trying vLLM after some articles were reporting very appealing numbers (23-24x...