kyle-v6x

Results 21 comments of kyle-v6x

After further testing, adding the `chunk_size_s` parameter to the inference works correctly for small inputs. Even if this is expected, it would be nice to throw some error or warning...

@anyscalesam Sure! My only concern is that f[rom the 2.10 release there is a new parameter which allows us to shed the load based on a maximum replica queue size](https://github.com/ray-project/ray/issues/42950)....

@edoakes Thanks for clarifying. Got a little too tunnel-visoned on our own use-case, but I see how `max_queued_requests` could be useful when you aren't directly controlling the timeout of requests...

One more note. We tried the following pattern, but I haven't dug into whether the returned reference really means that the task has been assigned to a replica. Before: ```...

@edoakes Got it. I'll start there when I return. Thanks for the input!

Finally got around to do some testing. I tried the following: ``` response_handle = None try: response_handle = handle.remote() # This is a batched method in practice start = time.time()...

@armenr Chiming in here. We were using a manual cluster through Ray, which was able to stop nodes instead of terminating them. This meant that we could initialize the cluster...

@edoakes @zcin @GeneDer Any eta on getting some eyes on this?

> @Stack-Attack Could you help me understand the PR better? My understanding is this PR does two things: > > 1. Records a history of ongoing request data, and averages...

The intended goal here is to simplify auto-scaling configs for deployments where you aren't sure what the traffic pattern is yet. A common problem we've been having when going to...