cstar_perf icon indicating copy to clipboard operation
cstar_perf copied to clipboard

Fatal error: One or more hosts failed while executing task 'start'

Open zhaogxd opened this issue 8 years ago • 9 comments

I have been following the steps addressed in "Setup cstar_perf.tool" to setup a test cluster. Following error is encountered when running "cstar_perf_bootstrap apache/cassandra-2.1".

[cnode1] Executing task 'start' !!! Parallel execution exception under host u'cnode1': Process cnode1: Traceback (most recent call last): File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run self._target(_self._args, *_self._kwargs) File "/usr/lib64/python2.7/site-packages/fabric/tasks.py", line 242, in inner submit(task.run(_args, *_kwargs)) File "/usr/lib64/python2.7/site-packages/fabric/tasks.py", line 174, in run return self.wrapped(_args, *_kwargs) File "/usr/lib64/python2.7/site-packages/fabric/decorators.py", line 181, in inner return func(_args, *_kwargs) File "/usr/lib/python2.7/site-packages/cstar_perf/tool/fab_cassandra.py", line 415, in start cfg = config['hosts'][fab.env.host] KeyError: u'192.168.188.11'

Fatal error: One or more hosts failed while executing task 'start'

Underlying exception: u'192.168.188.11'

Aborting.

My cluster_config.json file is created as below:

{ "commitlog_directory": "/mnt/d1/commitlog", "data_file_directories": [ "/mnt/d2/data", "/mnt/d3/data", "/mnt/d4/data" ], "block_devices": [ "/dev/sda" ], "blockdev_readahead": "8192", "hosts": { "cnode1": { "internal_ip": "192.168.188.11", "hostname": "cnode1", "seed": true } }, "user": "hadoopuser", "name": "mycluster", "saved_caches_directory": "/mnt/d2/saved_caches" }

Any suggestions?

zhaogxd avatar Jun 20 '16 05:06 zhaogxd

How old is your code? The latest version of fab_cassandra.py doesn't have Line 415. Can you please update the code and retry?

nastra avatar Jun 24 '16 17:06 nastra

You are right that my version of fab_cassandra.py was an old version since I just ran 'sudo pip install cstar_perf.tool' command in my home folder. It seems that the pip just grab an old version and get it installed. I am not familiar with pip.

In order to get the latest version of cstar_perf, I ran 'git clone' command to get the latest version of cstar_perf downloaded to my local path, then I run 'sudo pip install ./cstar_perf/tool' to get the latest version of cstar_perf installed.

Then, I ran 'cstar_perf_bootstrap -v apache/cassandra-2.1'. This time, the command went much further, and seems to be able to launch the Cassandra on cnode1. But by the end, I got following error:

**[192.168.188.71] out: 127.0.0.1 rack1 Up Normal 102.81 KB 100.00% 8938400078857263027 [192.168.188.71] out: 127.0.0.1 rack1 Up Normal 102.81 KB 100.00% 9038662740528651599 [192.168.188.71] out: 127.0.0.1 rack1 Up Normal 102.81 KB 100.00% 9091135296963166026 [192.168.188.71] out: 127.0.0.1 rack1 Up Normal 102.81 KB 100.00% 9106912787709865125 [192.168.188.71] out: 127.0.0.1 rack1 Up Normal 102.81 KB 100.00% 9111136576899851738 [192.168.188.71] out: [192.168.188.71] out: Warning: "nodetool ring" is used to output all the tokens of a node. [192.168.188.71] out: To view status related info of a node use "nodetool status" instead. [192.168.188.71] out: [192.168.188.71] out: [192.168.188.71] out: [192.168.188.71] Node is not up (yet): 192.168.188.71 [192.168.188.71] waiting 10 seconds to try again..

Fatal error: Timed out waiting for all nodes to startup

Aborting. WARNING:benchmark:'NoneType' object has no attribute 'split'

Fatal error: Cassandra is not up!

Aborting.**

It seems that the Cassandra launched on cnode1 is listening to 127.0.0.1 rather than the static IP address 192.168.188.71. Is this the cause of the failure? If so, how do I tell the Cassandra node to listen to 192.168.188.71?

zhaogxd avatar Jun 28 '16 06:06 zhaogxd

can you make sure that /etc/hosts is properly configured and that hostname resolution works? Maybe looking through the logic in https://github.com/datastax/cstar_perf/blob/master/tool/cstar_perf/tool/benchmark.py might help debugging your problem

nastra avatar Jun 28 '16 07:06 nastra

My /etc/hosts file has following content on both cstress1 and cnode1: (my Linux box is CentOS7)

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.188.10 stress1 192.168.188.11 cnode1

The hostname works on both nodes.

Note: You may notice that the ip address in my first post is 192.168.188.71 rather than 192.168.188.10 since I tried to install this tool in two clusters, but the results are the same.

I also looked into the benchmark.py, but found no clue.

I still think that the direct reason of the failure could be the 'listen_address' is set to 'localhost' in the cassandra.yaml file on cnode1 rather than 192.168.188.11, but I don't know how to let the cstar_perf to set the 'listen_address' on cnode1 properly.

Any comments would be appreciated!

zhaogxd avatar Jun 29 '16 06:06 zhaogxd

@csplinter @mshuler any ideas?

nastra avatar Jun 29 '16 07:06 nastra

tool/cstar_perf/tool/fab_common.py sets cass_yaml['listen_address'] = cfg['internal_ip'] and I looked at one of our running clusters, which looks really similar to the yaml posted above. I don't have any hosts entries at all.. Since this started out with an old code install, were both the frontend and client source updated to current master @zhaogxd? Just trying to rule out the obvious. Starting fresh and working through the steps without old code, table data, etc. around might be the worst case.

mshuler avatar Jun 29 '16 15:06 mshuler

I just ran 'cstar_perf_bootstrap -v apache/cassandra-2.1' on a fresh Centos7, 2 node cluster with the following cluster_config.json and everything came up as expected using the internal_ip. @zhaogxd I am not sure what is causing your problem but I would double check your cluster_config.json located at .cstar_perf/cluster_config.json to make sure the internal_ip is set to your static ip address that you want and not 127.0.0.1

{
    "user": "chris",
    "cluster_name": "centos7_cluster",
    "product": "cassandra",
    "saved_caches_directory": "/var/lib/cassandra/saved_caches",
    "commitlog_directory": "/var/lib/cassandra/commitlog",
    "log_dir": "/var/log/cassandra",
    "data_file_directories": ["/var/lib/cassandra/data"],
    "block_devices": ["/dev/vda1"],
    "blockdev_readahead": "8192",
    "hosts": {
        "ip-10-200-179-220": {
            "internal_ip": "10.200.179.220",
            "hostname": "ip-10-200-179-220",
            "seed": "true"}
    }
}
...
[10.200.179.220] out: 10.200.179.220  rack1       Up     Normal  51.67 KB        100.00%             8619020829015697437                         
[10.200.179.220] out: 10.200.179.220  rack1       Up     Normal  51.67 KB        100.00%             8671272831602522107                         
[10.200.179.220] out: 10.200.179.220  rack1       Up     Normal  51.67 KB        100.00%             8700554652956383716                         
[10.200.179.220] out: 10.200.179.220  rack1       Up     Normal  51.67 KB        100.00%             8780708402645289080                         
[10.200.179.220] out: 10.200.179.220  rack1       Up     Normal  51.67 KB        100.00%             9012052654952196740                         
[10.200.179.220] out: 10.200.179.220  rack1       Up     Normal  51.67 KB        100.00%             9054345547042666849                         
[10.200.179.220] out: 10.200.179.220  rack1       Up     Normal  51.67 KB        100.00%             9138156284636759154                         
[10.200.179.220] out: 
[10.200.179.220] out:   Warning: "nodetool ring" is used to output all the tokens of a node.
[10.200.179.220] out:   To view status related info of a node use "nodetool status" instead.
[10.200.179.220] out: 
[10.200.179.220] out: 
[10.200.179.220] out:   
[10.200.179.220] All nodes available!
INFO:benchmark:Started cassandra on 1 nodes with git SHAs: {u'ip-10-200-179-220': 'cb14186f8d6c2d1105a51e409c59a4e424958171', 'chris@ip-10-200-179-22': 'cb14186f8d6c2d1105a51e409c59a4e424958171'}

csplinter avatar Jun 30 '16 03:06 csplinter

@zhaogxd any luck?

nastra avatar Jul 22 '16 17:07 nastra

Hi Eduard,

Thanks for the follow-up! I am recently busying on something else and have to put this test to a lower priority. I will continue my test on this tool when I got time and let you know my progress for sure.

Have a nice day!

Guang

On Fri, Jul 22, 2016 at 10:45 AM, Eduard Tudenhöfner < [email protected]> wrote:

@zhaogxd https://github.com/zhaogxd any luck?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/datastax/cstar_perf/issues/223#issuecomment-234609205, or mute the thread https://github.com/notifications/unsubscribe-auth/AETC_f_19PC9eed6jgBLxQjwmCV-mZ_bks5qYQGdgaJpZM4I5YB_ .

zhaogxd avatar Jul 28 '16 16:07 zhaogxd