aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

Stateful information sync for ext-proc Instances

Open Jeffwan opened this issue 10 months ago • 4 comments

🚀 Feature Description and Motivation

Our external process in the Envoy Gateway tracks request routing using prefix cache awareness, making it stateful. To ensure consistency and availability across multiple instances, we need to design a state synchronization mechanism.

  • Current ext-proc builds a hash blocks/radix tree to track requests.
  • Multiple instances need to share this prefix cache state and avoid data inconsistencies between instances.
  • Ensuring minimal performance impact while syncing state.

external distributed store/election/event-driven sync etc seems a little bit heavy. We need to consider the lightweight approach for implementation.

Use Case

unblock prefix-cache awareness request scheduling in HA case

Proposed Solution

No response

Jeffwan avatar Feb 27 '25 19:02 Jeffwan

I think we can use Redis as the state storage to cache prefix-based data and track real-time request counts. This would be a relatively straightforward solution. Moreover, since the Redis component has already been introduced in the current rate limiting implementation, this approach would not introduce any new dependencies to the system.

Additionally, if users prefer not to use Redis for storage, we can still maintain the in-process memory storage approach. This remains developer-friendly for testing or quick start scenarios.

firebook avatar Apr 17 '25 07:04 firebook

Any update?

firebook avatar Apr 25 '25 10:04 firebook

@Jeffwan @varungup90 any update?

firebook avatar May 15 '25 09:05 firebook

@firebook I haven't got the opportunity to work on it yet. It is in my high priority list.

If you are interested, please take the lead to propose implementation design, I can help you in the process. Thank you.

varungup90 avatar May 15 '25 17:05 varungup90

@Jeffwan @varungup90 Any update? I guess we fail to deploy multiple instances util scheduling info is made stateful.

justadogistaken avatar Jun 23 '25 09:06 justadogistaken

Would it be ok to deploy multiple instances(3+), but only one instance can do scheduling? Relying etcd or redis to select master.

justadogistaken avatar Jun 23 '25 11:06 justadogistaken

Or could we design a two-level scheduler? first level to route traffic to sub-scheduler with model service. The Second level router schedules traffic by strategy(load/prefix cache)

justadogistaken avatar Jun 23 '25 11:06 justadogistaken