Muhamad Awad

Results 165 comments of Muhamad Awad

I am sure we already can leverage on this https://github.com/prometheus/node_exporter or something similar this already does all the monitoring for you and is prometheus compatible.

@xmonader why this is needed?

This is (from experience) a fuse filesystem issue. the zdb-fs is probably missing the `statfs` call implementation. and then the `overlay` filesystem fails to use it. @maxux

@maxux latest update that this might be related to zdbfs not working as upper layer in an overlay mount. So can we link this issue here with update if this...

I need to clarify something first, a node can initiate a delete if it failed to start a workload, even if it has been running for some time. So basically...

On other hand, the bot should recover by redeploying another container on a different node if suddenly this node is not reachable anymore.

After investigating the issue more and looking deeper on the state of the node I found the cause of this issue ![image](https://user-images.githubusercontent.com/10920323/117936474-2dd3dd80-b305-11eb-8891-546191c76892.png) The OOM decided to kill some of the...

So this was caused by the following issues: - The node itself had a physical problem with one of the disk, this has been replaced - The oom has killed...

We will make a new release RC9 during the day that has more fixes to the zos network. This might have a fix for the issue you are having.

Okay, rc9 is ready could you let me know if you still have this problem. Thanks