Furkan Mustafa comments

Results 39 comments of


                                            Furkan Mustafa

Many problems during recovery

3 days later, sheep processes starting crashing (segfaults) randomly here and there and cluster is down again.

Many problems during recovery

I've seen a new behavior this time, first time since I've started using sheepdog; I say, any dog command, for example; ``` dog node list ``` It just hangs there....

Only one node showing a different, but not less strange behavior. It says; ["There are no active sheep daemons"](https://github.com/sheepdog/sheepdog/blob/988e12c30e6662a0a24d8c989b754a9300622ce0/dog/dog.c#L93) ``` root@server:~# ps axf PID TTY STAT TIME COMMAND 2099 ?...

Many problems during recovery

Ok. This time looks like zookeeper went crazy. nodes were blocked on their init while trying to join the cluster, thus not answering cli calls. All the troubles in the...

Many problems during recovery

A question at this point; Is this huge recovery process caused by automatic vnode assignment? If the vnode-assignment was done manually from the beginning, would this recovery process be only...

Many problems during recovery

Status is the morning; I have managed to get dog client lockups again. I am suspecting this is probably due to some cluster locks being stucked in a locked state?...

Many problems during recovery

Ok. on one of the nodes, I have this => ``` Feb 11 01:41:49 ERROR [main] listen_handler(1036) failed to accept a new connection: Too many open files ``` I'll try...

Many problems during recovery

Yes. Looks like that problem in single node in cluster blocked the whole cluster. When I restarted that node, everything started to move again. Lost object counts increased a bit...

Many problems during recovery

Another constant problem I am experiencing is; - about to finish a huge recovery - all nodes are recovered except one - that is also about to finish %95+ somewhere...

Many problems during recovery

After this last restart, recovery is almost not proceeding at all.. getting this error in some of the nodes; ``` Feb 11 16:15:27 ERROR [main] check_request_epoch(157) old node version 1166,...