makara icon indicating copy to clipboard operation
makara copied to clipboard

Blacklisting questions/concerns with master node.

Open rmontgomery429 opened this issue 9 years ago • 15 comments

Does the master node get blacklisted? If so, why? We're running a "all writes to master, all reads to a replica" setup and it would seem that master should never be blacklisted. Does setting blacklist_duration for the master node to 0 prevent blacklisting?

Also, is there a way to turn off blacklisting completely? If so, what is the recommended way to do this?

rmontgomery429 avatar Dec 28 '15 18:12 rmontgomery429

:+1:

todddickerson avatar Dec 28 '15 18:12 todddickerson

in connection_wrapper i see this

    def _makara_blacklist!
      @connection.disconnect! if @connection
      @connection = nil
      @blacklisted_until = Time.now.to_i + @config[:blacklist_duration]
    end

that suggests if you set blacklist_duration to zero or negative, it would likely work out.

bleonard avatar Jan 06 '16 17:01 bleonard

if I'm reading it right, if we never raised BlacklistConnection that would also be like it was just a "normal" errors (you'd see the original)

maybe gracefully in https://github.com/taskrabbit/makara/blob/master/lib/makara/error_handler.rb should not be so graceful if blacklist_duration <= 0 or some other setting is set.

bleonard avatar Jan 06 '16 17:01 bleonard

Yeah, some things we noticed is that our database was returning errors, but we never saw the underlying error which was hard to debug. So I'm not sure gracefully is helpful in that regard. Perhaps blacklisting could be appended with some kind of nested exception or message like "Blacklisting due to: 'blah blah blah'" might be more helpful.

We also did set the value to 0 but still saw blacklisting happening, but that might have been the gracefully stepping in there masking the underlying issue. So maybe it wasn't being blacklisted but it looked like it was.

For the moment we've removed makara in an attempt to debug the underlying issues, but are eager to put it back in as soon as possible.

I'm also thinking that a more explicit 'blacklisting: on/off' sort of mechanism would be clearer than setting the duration to 0 or -1. It works and maybe could just be documented as a start. If this were the "correct way" then yes, I would also expect to not see any blacklisting errors.

rmontgomery429 avatar Jan 06 '16 18:01 rmontgomery429

we do log the error.

::Makara::Logging::Logger.log("[Makara] Gracefully handling: #{err}")

You might have to set up the logger to your Rails one. Something like this:

Makara::Logging::Logger.logger = Rails.logger

if you catch BlacklistConnection, you can also see the original error via the original_error method.

bleonard avatar Jan 08 '16 01:01 bleonard

I'm going to combine this with this one (https://github.com/taskrabbit/makara/issues/78) -- I think they are both saying that master is special and especially if there is only one, there is something interesting that should happen where it works like "normal" - more investigating on that that means but "merging" them.

bleonard avatar Jan 08 '16 23:01 bleonard

:+1:

rmontgomery429 avatar Jan 09 '16 17:01 rmontgomery429

Supporting use-case: An AWS RDS node in Multi-AZ mode can failover automatically. The DNS name remains the same, and the standby-node is transparently made master. Reconnection(s) by Makara would be necessary.

Read Replicas are for Slave-use only. They can be promoted to a stand-alone Master, but they are then disconnected from the replication topology.

robbwagoner avatar May 20 '16 13:05 robbwagoner

@robbwagoner We ran into the exact same issue.

rmontgomery429 avatar May 24 '16 03:05 rmontgomery429

Did the master issue ever get addressed? We're seeing similar issues as well.

clarakwan avatar Jun 05 '17 22:06 clarakwan

Wondering the same as @clarakwan

NoSync avatar Mar 16 '18 12:03 NoSync

We are also seeing an issue (in Rails) when master got blacklisted wrongfully due to this https://github.com/taskrabbit/makara/issues/207 hence the connection was closed in _makara_blacklist!. But because it's in a transaction, AR attempted to rollback the transaction on the same connection (transaction keeps the connection instead of checking out a new one: https://github.com/rails/rails/blob/4-2-stable/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb#L51) which was closed due to blacklisting. Therefore it raises a connection closed exception.

I monkey patched our use of Makara to never call _makara_blacklist! on the master to avoid this.

johnwu96822 avatar Jun 01 '18 17:06 johnwu96822

In my case I discovered that if master failed, it never tries to refresh DNS and get new IP for master (not 100% sure if this is not blacklisting).

A note that transparently DNS is replaced to a new master by automated DB failover.

The only thing that helps is just restarting Rails which is an expensive option.

Am I missing some option in database.yml? Are you solving the same issue? For JVMs there is the parameter networkaddress.cache.ttl so I think equivalent option should help.

laimison avatar May 07 '19 23:05 laimison

To leave feedback, in my case sonots patch worked to solve master's DNS issue. More details:

https://github.com/ankane/distribute_reads/issues/24

laimison avatar May 18 '19 12:05 laimison

By the way, wasn't the parameter disable_blacklist available at the time of writing to solve your issues?

An example:

connections:
  - role: master
    host: mymaster.sql.host
    disable_blacklist: true
    
  - host: myslave.sql.host
    name: Slave

From README.md

disable_blacklist - do not blacklist node at any error, useful in case of one master

laimison avatar May 29 '19 23:05 laimison