vmware_exporter icon indicating copy to clipboard operation
vmware_exporter copied to clipboard

Change serialization of alarms and sensors for objects

Open Z903 opened this issue 4 years ago • 4 comments

This fixes the root cause of 'Fix for badly behaving super-micro sensor'

This is a breaking change to the return value of helpers.batch_fetch_properties.

Z903 avatar Jun 12 '21 09:06 Z903

I would like a little more about this change in the description and why the changes were made. This overall is some great changes, but I want to make sure I understand it all.

So there are a few changes in there ( they likely should have been seperate MR but my boss needed a solition asap ).

  1. We had a badly behaving supermicro server that reported badly. I honestly dont even remember but I think it crashed the collector. This was the origional reason for digging into this project. :sweat_smile:
  2. Instead of packing alarms/metrics into string and parsing them I am using objects. This removes some of the string issues and makes things easier to use moving forwards.
  3. The best practice at my job is to have one issue per alert. The alert metrics ( like vmware_host_yellow_alarms ) have been adjusted to show one issue per metric. They also have the s on the end removed to reflect this change. Merging of issues should be left to higher level abstractions like alertmanager.
  4. While diving into the code I discovered that you could pass as a parameter any vmware URL. An attacker could then try to steal your credentials by directing an existing profile to a malicious server. If you want to enable the old exploitable behavior there is a flag since someone is likely relying on this behavior.
  5. The timeout stuff was experimental and so far has not caused issues. It did mitigate a slow vmsphere from locking up the whole process and making it unresponsive. There is still some kind of edge case where vmware could drop the connection and hang the process. Thus the systemd changes.

I wrote this up by looking over my commits. Sorry if I missed some things it was over a year ago. Feel free to ask for more details on things. It would be awesome to get this merged in.

Z903 avatar Feb 07 '23 01:02 Z903

I would like a little more about this change in the description and why the changes were made. This overall is some great changes, but I want to make sure I understand it all.

So there are a few changes in there ( they likely should have been seperate MR but my boss needed a solition asap ).

1. We had a badly behaving supermicro server that reported badly. I honestly dont even remember but I think it crashed the collector. This was the origional reason for digging into this project. 😅

2. Instead of packing alarms/metrics into string and parsing them I am using objects. This removes some of the string issues and makes things easier to use moving forwards.

3. The best practice at my job is to have one issue per alert. The alert metrics ( like `vmware_host_yellow_alarms` ) have been adjusted to show one _issue_ per metric. They also have the _s_ on the end removed to reflect this change. Merging of issues should be left to higher level abstractions like alertmanager.

4. While diving into the code I discovered that you could pass as a parameter any vmware URL. An attacker could then try to steal your credentials by directing an existing profile to a malicious server. If you want to enable the old exploitable behavior there is a flag since someone is likely relying on this behavior.

5. The timeout stuff was experimental and so far has not caused issues. It did mitigate a slow vmsphere from locking up the whole process and making it unresponsive. There is still some kind of edge case where vmware could drop the connection and hang the process. Thus the systemd changes.

I wrote this up by looking over my commits. Sorry if I missed some things it was over a year ago. Feel free to ask for more details on things. It would be awesome to get this merged in.

  1. I think we might want to remove the allow target url param and make it so that the url is in the configuration rather then using any input. Let me know your thoughts on this?

pryorda avatar Feb 08 '23 17:02 pryorda

If I understand correctly; You are proposing dropping vsphere_host and target entirely. If that were the case then the only way to select an instance would be via /metrics?section=some_string. That would definately fix the security issue.

I can likely get this done over the weekend.

Z903 avatar Feb 10 '23 06:02 Z903

@pryorda The url parameters vsphere_host and target have been removed. I have also updated the readme and merged in all the recent changes so it should be easy for you to move forward here.

Z903 avatar Feb 11 '23 19:02 Z903