Huamin Chen

Results 222 comments of Huamin Chen

Current Ampere xgene hwmon only reports the CPU and I/O power (per doc [here](https://docs.kernel.org/hwmon/xgene-hwmon.html)). We cannot get DRAM power. So to align with the RAPL reporting, kepler only reports kepler_node_core_total...

There are similar issues reported elsewhere. We have not been able to reproduced yet. For debugging, can you get the `sum(kepler_container_joules_total) from prometheus during the spike time? That'll help us...

Thanks @bjornpijnacker The two potential issues are: - kepler metrics overflow. We have seen RAPL overflow before but fixes have been put in for a while. - calculation-led overflow in...

I tested `jq` and `yq` on my end, for the use case kepler needs (i.e. parsing kubectl output), both can do the job. Since we only need one, what about...

@SamYuan1990 let's wait #1325

@jharriga can you double check if it is kepler_node_core_joules_total or kepler_node_package_joules_total? Current Ampere xgene hwmon only reports the CPU and I/O power (per doc [here](https://docs.kernel.org/hwmon/xgene-hwmon.html)). We cannot get DRAM power....

Need to see what is going on with ceph mon container, do you have logs to share?

@jmdots the above osd bootstrap failure was due to mon failure. As observed, osd tried to determine mon health, but at that time, mon also failed, so that caused the...

which kubernetes version you are using?

@tanhui2333 does #40 fix your issue?