makara
makara copied to clipboard
Blacklisting questions/concerns with master node.
Does the master node get blacklisted? If so, why? We're running a "all writes to master, all reads to a replica" setup and it would seem that master should never be blacklisted. Does setting blacklist_duration
for the master node to 0
prevent blacklisting?
Also, is there a way to turn off blacklisting completely? If so, what is the recommended way to do this?
:+1:
in connection_wrapper
i see this
def _makara_blacklist!
@connection.disconnect! if @connection
@connection = nil
@blacklisted_until = Time.now.to_i + @config[:blacklist_duration]
end
that suggests if you set blacklist_duration to zero or negative, it would likely work out.
if I'm reading it right, if we never raised BlacklistConnection
that would also be like it was just a "normal" errors (you'd see the original)
maybe gracefully
in https://github.com/taskrabbit/makara/blob/master/lib/makara/error_handler.rb should not be so graceful if blacklist_duration <= 0 or some other setting is set.
Yeah, some things we noticed is that our database was returning errors, but we never saw the underlying error which was hard to debug. So I'm not sure gracefully is helpful in that regard. Perhaps blacklisting could be appended with some kind of nested exception or message like "Blacklisting due to: 'blah blah blah'" might be more helpful.
We also did set the value to 0
but still saw blacklisting happening, but that might have been the gracefully stepping in there masking the underlying issue. So maybe it wasn't being blacklisted but it looked like it was.
For the moment we've removed makara in an attempt to debug the underlying issues, but are eager to put it back in as soon as possible.
I'm also thinking that a more explicit 'blacklisting: on/off' sort of mechanism would be clearer than setting the duration to 0
or -1
. It works and maybe could just be documented as a start. If this were the "correct way" then yes, I would also expect to not see any blacklisting errors.
we do log the error.
::Makara::Logging::Logger.log("[Makara] Gracefully handling: #{err}")
You might have to set up the logger to your Rails one. Something like this:
Makara::Logging::Logger.logger = Rails.logger
if you catch BlacklistConnection
, you can also see the original error via the original_error
method.
I'm going to combine this with this one (https://github.com/taskrabbit/makara/issues/78) -- I think they are both saying that master is special and especially if there is only one, there is something interesting that should happen where it works like "normal" - more investigating on that that means but "merging" them.
:+1:
Supporting use-case: An AWS RDS node in Multi-AZ mode can failover automatically. The DNS name remains the same, and the standby-node is transparently made master. Reconnection(s) by Makara would be necessary.
Read Replicas are for Slave-use only. They can be promoted to a stand-alone Master, but they are then disconnected from the replication topology.
@robbwagoner We ran into the exact same issue.
Did the master issue ever get addressed? We're seeing similar issues as well.
Wondering the same as @clarakwan
We are also seeing an issue (in Rails) when master got blacklisted wrongfully due to this https://github.com/taskrabbit/makara/issues/207 hence the connection was closed in _makara_blacklist!
. But because it's in a transaction, AR attempted to rollback the transaction on the same connection (transaction keeps the connection instead of checking out a new one: https://github.com/rails/rails/blob/4-2-stable/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb#L51) which was closed due to blacklisting. Therefore it raises a connection closed exception.
I monkey patched our use of Makara to never call _makara_blacklist!
on the master to avoid this.
In my case I discovered that if master
failed, it never tries to refresh DNS and get new IP for master
(not 100% sure if this is not blacklisting).
A note that transparently DNS is replaced to a new master by automated DB failover.
The only thing that helps is just restarting Rails which is an expensive option.
Am I missing some option in database.yml
? Are you solving the same issue? For JVMs there is the parameter networkaddress.cache.ttl
so I think equivalent option should help.
To leave feedback, in my case sonots patch worked to solve master's DNS issue. More details:
https://github.com/ankane/distribute_reads/issues/24
By the way, wasn't the parameter disable_blacklist
available at the time of writing to solve your issues?
An example:
connections:
- role: master
host: mymaster.sql.host
disable_blacklist: true
- host: myslave.sql.host
name: Slave
From README.md
disable_blacklist - do not blacklist node at any error, useful in case of one master