lura icon indicating copy to clipboard operation
lura copied to clipboard

Minimize KrakenD downtime in production

Open oskoi opened this issue 5 years ago • 2 comments

Is your feature request related to a problem? Please describe. Given that krakend stateless and every configuration change requires a restart, it would be nice to minimize downtime.

Describe the solution you'd like One solution is to use endless http server (https://github.com/fvbock/endless/) or something similar. It allows us to finish requests on the old version, and all new requests are already going to the latest version.

Additional context I use such a custom version of krakend with endless. If there are problems with this proposal, it would be nice to add at least the ability to change the http server globally.

oskoi avatar Dec 27 '19 17:12 oskoi

I think that's already supported by the router.Factory (or at least, partially). You can shutdown the running router by canceling the context passed to it. This will call a graceful shutdown, so no running connections will be lost and the incoming ones won't be accepted (until a new service starts and replaces the old one). Regarding the topic of avoiding package lost during a restart, there are some articles and how-to like: https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html

So, you can plug your config monitoring layer in charge of calling https://github.com/devopsfaith/krakend-ce/blob/master/cmd/krakend-ce/main.go#L56 and cancelling the context after detecting a change and before calling again the cmd.Execute method

If you want to implement such components, keep in mind you're moving from a stateless env to a stateful one, so you'll need to add a coordination layer between your instances and think very hard about what are you willing to sacrifice in the CAP space

kpacha avatar Dec 27 '19 20:12 kpacha

Thanks for the article! You're right about stateful env, but it's minimal if we are talking about restarting one instance, not a cluster. In my case, I have a simple app that manages restarts and controls the status of krakend process. If something goes wrong, the app manager can restart it or stop itself, and further actions will be taken by systemd/supervisord/etc. Maybe this adds a bit more complexities for krakend ce, but I wanted to share the proposal.

Why krakend doesn't reuse one http server? When I was replacing the default http server with endless, I encountered a problem. Kraken uses one http server, and krakend-opencensus creates a new one for prometheus exporter. On the one hand, this shares responsibility and decouples exporter from the main http server, but on the other hand, this adds complexity if I want to replace it, and I should keep track of all components that use http server. What you think about this? Maybe it worth to propagate one http server everywhere by default.

oskoi avatar Dec 28 '19 14:12 oskoi

Hi, thank you for bringing this issue to our attention.

Many factors influence our product roadmaps and determine the features, fixes, and suggestions we implement. When deciding what to prioritize and work on, we combine your feedback and suggestions with insights from our development team, product analytics, research findings, and more.

This information, combined with our product vision, determines what we implement and its priority order. Unfortunately, we don't foresee this issue progressing any further in the short-medium term, and we are closing it.

While this issue is now closed, we continue monitoring requests for our future roadmap, including this one.

If you have additional information you would like to provide, please share.


This is an automated comment. Responding to the bot or mentioning it won't have any effect

github-actions[bot] avatar Jun 22 '23 07:06 github-actions[bot]

This issue was marked as resolved a long time ago and now has been automatically locked as there has not been any recent activity after it. You can still open a new issue and reference this link.

github-actions[bot] avatar Sep 21 '23 00:09 github-actions[bot]