firebook

Results 8 comments of firebook

So when will you merge this request? @mneethiraj @ted12138

> If you are using the async stub API, you can workaround this issue in v1.30 and earlier, by: > > ```java > // If client is receiving many small...

> Even with query sequence length 1 if we can mark all tokens from prefixes to be persistent in cache - it could bring some speed up to inference. +1

@cr7258 @johnlanni envoy 的设计也是用独立的 token_bucket 参数做分支,我觉得这样其实更容易理解,固定时间窗口和固定桶,就是不一样的东西,不用硬是把它们合并在一起。 当用户使用固定时间窗口时: ``` token_per_minute:1000 ``` 当用户使用令牌桶时: ``` token_bucket: max_tokens: tokens_per_fill: fill_interval: ``` 而且,higress 本身底层是 envoy,保持跟 envoy 比较一致的配置体验,对于使用方来说,使用成本更低,也更有意愿来使用 higress。 另外,看 envoy 的描述,RequestsPerTimeUnit 和 TokenBucket 同时使用,它会使用什么算法,是非常含糊的,RequestsPerTimeUnit 可能使用任意的算法,这种逻辑,反而比较混乱。最好就是两者是独立的。...

I think we can use Redis as the state storage to cache prefix-based data and track real-time request counts. This would be a relatively straightforward solution. Moreover, since the Redis...