ambari-cassandra-service
ambari-cassandra-service copied to clipboard
WebUI shows no nodes live when they're actually up and pass health checks
I was able to get the plugin working. I'm using this on CentOS and it was required that I install the datastax repo for yum first before anything would work (can this be automated?), but my main issue now is the UI is reporting inconsistent information.
The health checks for the "Cluster Nodes" is working (why is it called this? shouldn't they be more descriptive like "C* Nodes"?), but the Ambari UI shows the following:
(ignore the 4 warning alerts, they're not related to Cassandra)
When I run a nodetool status
you can see all my nodes are up:
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.147.0.23 87.84 KB 256 51.4% 300c7c50-e1ca-4979-8fc4-0d7bf48e766b RAC1
UN 10.147.0.22 192.27 KB 256 52.0% 521ffe0d-4a32-4e29-8862-d9297c53e8d2 RAC1
UN 10.147.0.21 234.35 KB 256 48.9% 3c1f75d3-c111-45f0-85bc-cc0a795c5cad RAC1
UN 10.147.0.24 241.48 KB 256 47.7% 24b59f0b-24d4-4322-900c-4657f37e05af RAC1
I've just redeployed a cluster and the issue remains. Any suggestions?
This should not happen. You can check Ambari-agent logs and server logs if there are any exceptions.
I get these errors for my 3 C* nodes in ambari-agent.log.
2016-05-05 16:04:58,281 [CRITICAL] [Cassandra] [Cassandra_service] (Cassandra Service Process) Connection failed: [Errno 111] Connection refused to ip-10-147-0-23.ec2.internal:7000
2016-05-05 16:05:01,265 [CRITICAL] [Cassandra] [Cassandra_service] (Cassandra Service Process) Connection failed: [Errno 111] Connection refused to ip-10-147-0-22.ec2.internal:7000
2016-05-05 16:05:03,661 [CRITICAL] [Cassandra] [Cassandra_service] (Cassandra Service Process) Connection failed: [Errno 111] Connection refused to ip-10-147-0-21.ec2.internal:7000
Yet I can connect to these host:port's from the machine ambari-server is installed on.
[centos@ip-10-147-0-10 ambari-server]$ telnet ip-10-147-0-21.ec2.internal 7000
Trying 10.147.0.21...
Connected to ip-10-147-0-21.ec2.internal.
I also have no problem running CQLSH and connecting to the cluster.
Are you able to resolve the issue?
No I haven't. I was going to look into it some more soon. Has anyone else reported this problem?
What's really strange about this is that the heartbeats seem to be working fine and Cassandra is inded running (notice it says "No Alerts"), but this summary window says 0/3 nodes are live. What part of the plugin code would be responsible to indicating with a Cluster Node is live or not on this view?
Probably a symptom of the same problem. When I go into a specific host it shows the Cassandra service as not started, even though it's running.
This might be an issue with the status function. Can you please confirm if there are no exceptions being thrown here?
The recommended way for defining the status function is as follows: Run some command to check if the component is running.
- If the component is running, do not throw any errors, 0 return code on running the command.
- If the component is not running, raise ComponentIsNotRunning exception.
@mithmatt I'll add some exception handling and confirm the return code.
Earlier I did actually stick a debug statement in the status function, but it never appeared to be executed.
the status function in the python file is executed by ambari for the heartbeat. I tried reinstalling the service and I don't see the issue.
What OS version are you using? What is the HDP stack version you are using? What is the ambari version? Try changing the status method in cassandra_master.py to check the pid file by giving the path of pid in check_process_status method.
For some reason service cassandra status
was returning an exit code of 3 even though the service was running successfully.
I'm running CentOS 7, so I'm using systemd. The exit code of the equivalent systemd command returned a 0 exit code. When I updated the status command in cassandra_master.py
to systemctl status ambari-service
the "warning" icon flipped to an "ok".
[centos@ip-10-147-0-21 ~]$ ./saferuncommand.sh sudo systemctl status cassandra
● cassandra.service - SYSV: Starts and stops Cassandra
Loaded: loaded (/etc/rc.d/init.d/cassandra)
Active: active (exited) since Thu 2016-06-02 17:23:02 UTC; 1 day 2h ago
Docs: man:systemd-sysv-generator(8)
Process: 32132 ExecStop=/etc/rc.d/init.d/cassandra stop (code=exited, status=1/FAILURE)
Process: 32182 ExecStart=/etc/rc.d/init.d/cassandra start (code=exited, status=0/SUCCESS)
Jun 02 17:23:02 ip-10-147-0-21 systemd[1]: Starting SYSV: Starts and stops Cassandra...
Jun 02 17:23:02 ip-10-147-0-21 su[32189]: (to cassandra) root on none
Jun 02 17:23:02 ip-10-147-0-21 systemd[1]: Started SYSV: Starts and stops Cassandra.
Jun 02 17:23:02 ip-10-147-0-21 cassandra[32182]: Starting Cassandra: OK
0
[centos@ip-10-147-0-21 ~]$ ./saferuncommand.sh sudo service cassandra status
● cassandra.service - SYSV: Starts and stops Cassandra
Loaded: loaded (/etc/rc.d/init.d/cassandra)
Active: active (exited) since Thu 2016-06-02 17:23:02 UTC; 1 day 2h ago
Docs: man:systemd-sysv-generator(8)
Process: 32132 ExecStop=/etc/rc.d/init.d/cassandra stop (code=exited, status=1/FAILURE)
Process: 32182 ExecStart=/etc/rc.d/init.d/cassandra start (code=exited, status=0/SUCCESS)
Jun 02 17:23:02 ip-10-147-0-21 systemd[1]: Starting SYSV: Starts and stops Cassandra...
Jun 02 17:23:02 ip-10-147-0-21 su[32189]: (to cassandra) root on none
Jun 02 17:23:02 ip-10-147-0-21 systemd[1]: Started SYSV: Starts and stops Cassandra.
Jun 02 17:23:02 ip-10-147-0-21 cassandra[32182]: Starting Cassandra: OK
3
Yes for centos its good to use sysmtectl. If it is resolved close the issue.
Would you accept a PR that switches based on whether systemctl
is present?
def status(self, env):
import params
env.set_params(params)
status_cmd = format("""
if hash systemctl 2>/dev/null; then
systemctl status cassandra
else
service cassandra status
fi""")
Execute(status_cmd)
print 'Status of the Master'
@seglo 's solution worked for me.
I had the same issue on the same OS (CentOS).