kyle-v6x comments

Results 21 comments of


                                            kyle-v6x

Whisper Word-level Timestamps broken on some inputs

After further testing, adding the `chunk_size_s` parameter to the inference works correctly for small inputs. Even if this is expected, it would be nice to throw some error or warning...

[Serve] Add a timeout parameter for scheduling ray tasks to replicas

@anyscalesam Sure! My only concern is that f[rom the 2.10 release there is a new parameter which allows us to shed the load based on a maximum replica queue size](https://github.com/ray-project/ray/issues/42950)....

[Serve] Add a timeout parameter for scheduling ray tasks to replicas

@edoakes Thanks for clarifying. Got a little too tunnel-visoned on our own use-case, but I see how `max_queued_requests` could be useful when you aren't directly controlling the timeout of requests...

[Serve] Add a timeout parameter for scheduling ray tasks to replicas

One more note. We tried the following pattern, but I haven't dug into whether the returned reference really means that the task has been assigned to a replica. Before: ```...

[Serve] Add a timeout parameter for scheduling ray tasks to replicas

@edoakes Got it. I'll start there when I return. Thanks for the input!

[Serve] Add a timeout parameter for scheduling ray tasks to replicas

Finally got around to do some testing. I tried the following: ``` response_handle = None try: response_handle = handle.remote() # This is a batched method in practice start = time.time()...

Easiest/most straightforward way to "cache" some additional, custom Docker images into the AMI Build?

@armenr Chiming in here. We were using a manual cluster through Ray, which was able to stop nodes instead of terminating them. This meant that we could initialize the cluster...

Add new serve autoscaling parameter `scaling_function`

@edoakes @zcin @GeneDer Any eta on getting some eyes on this?

Add new serve autoscaling parameter `scaling_function`

> @Stack-Attack Could you help me understand the PR better? My understanding is this PR does two things: > > 1. Records a history of ongoing request data, and averages...

Add new serve autoscaling parameter `scaling_function`

The intended goal here is to simplify auto-scaling configs for deployments where you aren't sure what the traffic pattern is yet. A common problem we've been having when going to...