Furkan Mustafa

Results 39 comments of Furkan Mustafa

3 days later, sheep processes starting crashing (segfaults) randomly here and there and cluster is down again.

I've seen a new behavior this time, first time since I've started using sheepdog; I say, any dog command, for example; ``` dog node list ``` It just hangs there....

Only one node showing a different, but not less strange behavior. It says; ["There are no active sheep daemons"](https://github.com/sheepdog/sheepdog/blob/988e12c30e6662a0a24d8c989b754a9300622ce0/dog/dog.c#L93) ``` root@server:~# ps axf PID TTY STAT TIME COMMAND 2099 ?...

Ok. This time looks like zookeeper went crazy. nodes were blocked on their init while trying to join the cluster, thus not answering cli calls. All the troubles in the...

A question at this point; Is this huge recovery process caused by automatic vnode assignment? If the vnode-assignment was done manually from the beginning, would this recovery process be only...

Status is the morning; I have managed to get dog client lockups again. I am suspecting this is probably due to some cluster locks being stucked in a locked state?...

Ok. on one of the nodes, I have this => ``` Feb 11 01:41:49 ERROR [main] listen_handler(1036) failed to accept a new connection: Too many open files ``` I'll try...

Yes. Looks like that problem in single node in cluster blocked the whole cluster. When I restarted that node, everything started to move again. Lost object counts increased a bit...

Another constant problem I am experiencing is; - about to finish a huge recovery - all nodes are recovered except one - that is also about to finish %95+ somewhere...

After this last restart, recovery is almost not proceeding at all.. getting this error in some of the nodes; ``` Feb 11 16:15:27 ERROR [main] check_request_epoch(157) old node version 1166,...