andreapiso
andreapiso
@sihanwang41 what is the status of Multi Apps in Ray Serve? The documentation still shows only the `serve.run` method with centralised YAML file which I am not sure whether it...
@darinkishore what stack trace do you need? The program does not crash, it hangs because openAI does not response until it prints that it's retrying with exponential backoff because api.openai.com...
I tested it today with 1106 and the OpenAI API still gets stuck fairly often, which makes dspy take a almost two hours to complete a generation of 8 candidate...
In particular, I am starting to get: ``` Error for example in dev set: HTTP code 502 from API ( 502 Bad Gateway 502 Bad Gateway cloudflare ) ``` I...
> Can you help me understand the issue? I run large experiments with OpenAI (the latest turbo model at all times) and, except when they note on https://status.openai.com/ that there...
It happens consistently with single thread too, which is actually the worst case scenario, because at least with multiple requests in parallel other requests in the optimisation set can stil...
I am definitely nowhere near hitting the rate limit, this happens even just sending 20-30 calls over the span of the optimisation process. ``` Basically, with a single thread, during...
Any news on the state of this PR? Support for vLLM would be super.
> PagedAttention seems to be nicer with respect to VRAM usage meaning it's better when you're low on VRAM. This mostly affects throughput at those regimes, not latency. > Correct,...
Yes, tgi is what we are using today and it's the best we have so far :) we were trying vLLM after some articles were reporting very appealing numbers (23-24x...