retire Specify multiple nodes

Specify multiple nodes

Open coffeencoke opened this issue 12 years ago • 14 comments

Is there a way to specify multiple urls or hosts?

If our master node goes down, we want the ability to failover to another node. If this does not exist we will probably be adding this functionality, so any direction from you would be great.

So far from browsing tire, I do not see anything that loads in a config yml file. Is this correct?

Dec 07 '11 21:12 coffeencoke

Hi, sorry for the delay.

tl;dr: yes, it's probably a useful feature for tire-contrib and EC2 will eventually force us to implement it

This is something which has been debated over and over, mainly on IRC with @clintongormley and others. It has even been said Tire «misses the point of elasticsearch» because it does not handle high-availability in the client :)

Now, I have couple of things to say to this. Let me explain step by step.

First, let me say that I think that handling high-availability/failover in client “completely misses the point of elasticsearch” :) One of the best things about elasticsearch is that it talks HTTP. Handling availability in HTTP stacks is what we, web developers, do all day.

Putting some kind of proxy (HAProxy, Varnish, custom Ruby proxy with em-proxy) in front of HTTP-based backend is easy, right? Most proxies can handle external sources for backend URL, can handle backend health, etc., right? That's what proxies are for, and people smarter then me have written those.

So, what you should do, generally speaking, is to invest you intellectual energy into building a robust stack, where you put a proxy in front of the elasticsearch cluster, which automatically does round-robin, health checks. Then you can talk to elasticsearch from whatever client, even curl, and still have failover handling.

Obviously, it could be argued that you've just introduced a single point of failure in your stack — but there's no free lunch.

Second, I've written one or two proxies in my life (see eg. https://github.com/igrigorik/em-proxy/blob/master/examples/balancing.rb), and while it's CS101-level programming, there are couple of interesting questions:

How do you handle health checks? Do you introduce some background worker? How is the worker being run?
How do you pass backend URLs to client? Do you introduce some kind of database?
How do you handle dead backends from the client? Do you pass the info somewhere?

These are all questions you have to go over before you start writing code.

Third, let's review what's possible now, as we speak:

Thanks to @dylanahsmith's patch at karmi/tire@94e7b89, it's now possible to rescue failure and perform the search at another node. You're able to wrap the search method in something like MyClass#my_robust_search, and try each of your nodes in succession. You'd most certainly would load such info from something like Redis, and you'd most certainly have some kind of background worker which checks the Cluster Nodes Info API and stores the info.

It is obviously an ugly solution, and the library should support you much better here. The most trivial start of this journey would be to store URLs from Tire.configuration { url 'http://es1.example.com', 'http://es1.example.com' } as an Array, and try them in succession in Tire::Search#perform, effectively doing what I just describe in above paragraph directly in the library.

Fourth, all this said, three things make a feature like this worth exploring and implementing:

elasticsearch by itself has no single point of failure, and it could be argued that clients should take advantage of that.
elasticsearch makes it very easy for clients to check the cluster state, and again, clients should take advantage of that.
But, first of all, in an environment such as Amazon EC2, where nodes become suddenly un-available for short periods of time, or right away die on you, you're forced to deal with failure even in modest applications, and rolling a robust proxy implementation may be a pointless exercise for you.

Still with me? So, here's how I think the feature should be approached and done, and I'll probably want it myself sooner or later.

Everything should be designed as a tire-contrib extension. There's no reason to pollute the core library with a thing like this, and Tire should be extensible like this anyway. Wherever the code in core would not support an extension like this, it must be changed.

Initially, when you pass a URL (or URLs) to Tire in the configure block (please load your YAML yourself and pass an Array, thank you! :), Tire performs the initial cluster state check, and retrieves & stores URLs for healthy nodes in a variable.

Then, when you perform a search, nodes are queried in round-robin strategy, in succession.

When a failure occurs where the node is not available, Tire will a) kick the node out of its set of healthy URLs, b) launch a background process which will retrieve fresh cluster state, and continue with next node in its set. If all nodes in set fail, in succession, it will give up and raise some SorryNobodyTalksToMe exception.

Notice how, in this implementation, we don't have to perform background health checks — we perform them during “normal” searches. We kick out dead nodes when they fail to respond to search (or other operation).

Now, the tricky part is of course the inter-process communication between the main code (your application) and the background process. I've done my share of process programming (see eg. https://gist.github.com/486161), but I still trip over from time to time when wearing sandals. I hope it will be doable without storing the nodes info in some external resource (file on disk, Redis, ...), but I'm definitely not sure.

Obviously, to make it work nicely, we must first a) implement the cluster API in the core library, b) change the concept of url in Tire to be an Array.

To resume: when done like this, it would be hidden from the regular library user, as you'd just seed Tire with initial URL(s), and the library takes it from there. For developers, it would be transparent what's happening, given we can just log all the events in the main Tire log.

Update: New in ES master, external multicast discovery, elasticsearch/elasticsearch#1532.

Dec 11 '11 11:12 karmi

Lets break it down a bit (the logic), at least in a very simplistic manner (And putting 1532 elasticsearch issue aside). Tire should be initialized with a list of seed URLs and roundrobin around them. There should be an option to "sniff" the rest of the cluster using the nodes info API, by going to elasticsearch, getting all the nodes and using them as the urls to round robin against.

One simple option is to have a list of live nodes that are always round robin through the API. A scheduled background job will go over the seed nodes and replace the live nodes. When sniffing is disabled, it means simple "ping" (HEAD on /) on each seed URL, building the new live nodes list, and replacing it onces done. When sniff is enabled, the seed nodes should be used to issue the nodes info API, get all the nodes in the cluster, and use the http_address from them to build the list of live nodes. This will allow Tire to be aware of new nodes being added to the cluster dynamically.

Make sense?

Dec 11 '11 20:12 kimchy

Yes, definitely makes sense, that's the vision I have, and close to what I have described above, thanks for chiming in, @kimchy.

I'm still unsure about the precise mechanics -- I'd like to be able to leave scheduled background jobs out, but of course, it's something which is at hand. Some Rake task which the user can schedule etc.

One problem with the solution I've outlined – starting poller on node failure – is that it handles failover well, but cannot easily add new nodes unless one of them fails.

Dec 11 '11 22:12 karmi

@karmi I don't see the need for a background process to check which nodes are up - it's really fast, so I'd just do it synchronously. Makes things much less complex.

In the Perl API, I accept 3 parameters:

an array containing the default_servers list (defaults to localhost)
refresh_after - an integer indicating the number of requests to perform before doing a cluster refresh
no_refresh - a boolean indicating that we shouldn't sniff, but just round-robin through the existing nodes (useful when, eg the client is outside the internal network, or behind a proxy, and so uses different IPs to the cluster)

A request does the following:

if refresh_in counter is zero (as it is on the first request) then:

** if no_refresh is:

*** false :

**** we try each server in turn (or in parallel for the async backends) to get a list of the live nodes **** if none of the nodes respond, then we throw an error **** if a node responds successfully, we extract the list of live nodes, and store them in servers in random order, and we set refresh_in to refresh_after (to start the count down)

*** true: set the servers list to the default_servers list

the first node in the servers list is used to perform the current request
if the current request returns an error:

** if the error is not a connection error, then we rethrow it ** otherwise, if no_refresh is :

*** true: remove the current node from the servers list

**** if we still have potentially "good" servers in the list, then use the next server in the list to retry the request, otherwise repopulate from the default_servers list and retry each node once, until either one succeeds, or all fail -> error

*** false: try to sniff the live server list from the union of default_servers and previously sniffed servers. if:

**** success: rerun the request

**** failure: throw an error

(Damn markdown doesn't render the above bullets correctly - hope this is still readable)

Couple of things I plan on changing:

make the timeout value for the sniff request much shorter than the timeout for other requests (to avoid eg lengthy timeouts when a switch is down or there are firewall issues)
make the refresh_after parameter dynamic, eg if the cluster changes (a server fails) then it is likely that it will come up again pretty quickly, so re-sniff more frequently. Once the cluster looks stable again, then refresh less frequently

Dec 12 '11 09:12 clintongormley

@clintongormley Hi, sorry for the delay with the response, Clint! What you're saying makes perfect sense, and I was thinking that maybe keeping track of some counter and issuing cluster nodes checks within regular operations could make perfect sense. It would certainly be more robust then the background polling and message passing, at the cost of a possibly very small overhead when checking with ES. Thanks!!

Dec 16 '11 10:12 karmi

Any updates on this? Just wondering because the most recent comments were made around two months ago.

Do you recommend going ahead with the rescue solution (94e7b89) or proxy for now?

Feb 27 '12 23:02 amfeng

No updates yet -- I guess putting a proxy in front of ES if you're interested in high availability makes sense for now.

Feb 28 '12 07:02 karmi

One option instead of creating a specific proxy, is running an elasticsearch instance configured with node.client set to trueon the same machine as the client, it will join the cluster, but just as a client node. Then, tire can be configured to talk to localhost:9200.

Feb 28 '12 10:02 kimchy

@kimchy @karmi That's what we are doing. And it works pretty well avoid all the hassle of setting up a HAProxy and whatnot. How long it will last I have no clear idea, but it seems that will take a long time.

Aug 15 '12 22:08 mereghost

Any updates on this?

Jul 19 '13 05:07 mkdynamic

Would it be feasible to use DNS to handle failover? Route 53 has health checks and failover support.

Jul 19 '13 10:07 mkdynamic

@mkdynamic The best solution would be to use a real proxy, such as HProxy, Nginx, etc. You can use Nginx as a round-robin proxy with keepalive pretty easy, see eg. https://gist.github.com/karmi/0a2b0e0df83813a4045f for config example.

Jul 19 '13 13:07 karmi

Thanks, that's helpful.

Main concern with a proxy is that it introduces a single point of failure. Obviously potential performance and cost downsides too. Is client support for multiple nodes on the roadmap for Tire? Or are you convinced a proxy is a better avenue?

— Sent from Mailbox for iPhone

On Fri, Jul 19, 2013 at 6:17 AM, Karel Minarik [email protected] wrote:

@mkdynamic The best solution would be to use a real proxy, such as HProxy, Nginx, etc. You can use Nginx as a round-robin proxy with keepalive pretty easy, see eg. https://gist.github.com/karmi/0a2b0e0df83813a4045f for config example.

Reply to this email directly or view it on GitHub: https://github.com/karmi/tire/issues/162#issuecomment-21248493

Jul 19 '13 18:07 mkdynamic

I wouldn't describe Nginx-based proxy to have "obviously potential performance and cost downsides". But yes, every proxy would be a "single point of failure", though I'd hesitate to describe it as a practical problem.

There's definitely planned support for robust, extensible multi-node support directly in Tire or its successors.

Jul 20 '13 07:07 karmi

retire retire copied to clipboard

Specify multiple nodes

@mkdynamic The best solution would be to use a real proxy, such as HProxy, Nginx, etc. You can use Nginx as a round-robin proxy with keepalive pretty easy, see eg. https://gist.github.com/karmi/0a2b0e0df83813a4045f for config example.

retire
retire copied to clipboard