How to make sure the legacy websocket connection still works if the listeners update without using the envoy hot-restart ?
Title: How to make sure the legacy websocket connection still works if the listeners update without using the envoy hot-restart ?
Description:
We are using the file system based LDS for dynamic resource update, and also envoy was also working as websocket proxy. If some lds (ip,socket options, or tls configs) change happen ,the listener will be draining and new listener will be created. But the legacy listeners's websocket connection will broken during these listeners' update period. So is there any methods or solution to handler the existing connections smoothly switch to the new listener?
AFAIK, LDS will update in place for some filter chain changes but otherwise we will drain the existing listener which will drain the existing websocket connection as you've seen. AFAIK there's no mechanism to otherwise get around this.
@KBaichoo if envoy listener can't do this , how envoy handle the legacy and new connections and traffic smoothly during some Control Plane configuration update ?
see https://github.com/envoyproxy/envoy/blob/1abf5e106fd15d7636e306b02c08ca55ec4bbd27/source/common/listener_manager/listener_manager_impl.cc#L800 for how in place filter chain update works and the callers of it to see the conditions where that holds true.
I don't think it's a good idea to expand that criteria to other fields such as ip, socket options, etc.
See also https://www.envoyproxy.io/docs/envoy/latest/operations/cli#cmdoption-drain-time-s if you want to increase your drain timeout so drained WS connection live longer.
@KBaichoo what will happen if the drain-time set to -1, seems the old version listener will not be draining any more, and the old connection will still usage able .And the new listener will also bind to the workers.
So after all the legacy connection in old listener filter chain was closed , the old version listener will continue draining or not ?
I think it'll set the value to uint32_t::max which will effectively disable draining.
yes, we have done some test. uint32_t::max or big value here may work , but our concern is if update listener resources many times and what will happen for these draining listeners objects . Is there any memory leak risk since these objects may can't be destroy since the draining timer is not triggered.
Is there any memory leak risk since these objects may can't be destroy since the draining timer is not triggered.
I'd think so since you're preventing cleanup. You should measure it for yourself and to see if it's appropriate for your use case. It's a tradeoff between drain-timeout and resource leak delay. Maybe 1h? 3h? 6h? 12h?24h? might be sufficient for your drain timeout vs "never drain"
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.
What happens if multiple old filter chains are not drained ? It will increase the memory requirements ? Is there a limit on max number of filter chains to be managed envoy ?
What happens if multiple old filter chains are not drained ?
> nothing happens but some object will not released.
It will increase the memory requirements ?
yes
Is there a limit on max number of filter chains to be managed envoy ?
seems no such limit. At least i can't find such kind of parameter
We select to such solution to enhance envoy for our case.
- we not using this drain-time control logic and switch to add a new "drain-check-interval" . if (drain-check-interval was configured) 1.We will not do drain the listener and the filter chains. 2.Start one timer according the value of this "drain-check-interval" we will start one timer to check the existing connections for this should be drained listener and filter chains for all the workers. If all the works' connection is closed , then drain the listener and the filter chains so that this will not impact the existing WebSocket connections during listener update and draining happened.
@wufanqqfsc Thanks for the quick response!
Has this custom solution been ported back to the public Envoy repository? If so, could you please share the PR details?
Not yet , @KBaichoo any comments for this solution ? I can provide the patch if it's ok
Also, one more doubt, we are updating the listener TLS context using SDS. Will it affect websocket connections on TLS context update using SDS ?. As per the envoy documentation, listener filter chain update drains the corresponding listener.
In our testing, we don't see connections getting terminated.