webhdfs
webhdfs copied to clipboard
HA Namenode suppport?
We have two namenodes for high availability and get StandbyException
when using the non-active namenode, which makes sense.
WebHDFS::IOError: {"RemoteException":{"exception":"StandbyException","javaClassName":"org.apache.hadoop.ipc.StandbyException","message":"Operation category READ is not supported in state standby"}}
from /usr/local/lib/ruby/gems/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:401:in `request'
from /usr/local/lib/ruby/gems/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:275:in `operate_requests'
from /usr/local/lib/ruby/gems/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:138:in `list'
from (irb):5
from /usr/local/bin/irb:11:in `<main>'
However, is it up to the client to figure which is the active namenode to use as the host in this library? Is there a way to specify multiple host address for this situation?
Just for reference, this is what we are using webhdfs for and a PR for getting around this issue: https://github.com/logstash-plugins/logstash-output-webhdfs/pull/18
There's no way to specify 2 or more host addresses right now. Pull requests are welcome :)
Another solution for people who already have something like HAproxy set up is to point webhdfs to the HAproxy, and have HAproxy monitor the two namenodes to route to the active.