Commands for ipmi and bmc die with timeout
I have the following in my ipmi-exporter.yml
modules:
default:
collectors:
- bmc
- ipmi
- chassis
- sel
- sel-events
driver: "LAN_2_0"
pass: "mypass"
privilege: "user"
user: "myuser"
This is part of my ipmi-targets.yml
- labels:
job: ipmi_exporter
targets:
- host1
- host2
. . .
My prometheus.yml portion for this exporter is this
- file_sd_configs:
- files:
- /etc/prometheus/ipmi-targets.yml
refresh_interval: 10m
job_name: ipmi-exporter
metrics_path: /ipmi
params:
module:
- default
When I do ps ax | grep ipmi-sel I see the following
ipmi-sel --quiet-cache --comma-separated-output --no-header-output --sdr-cache-recreate --output-event-state --interpret-oem-data --entity-sensor-names --config-file /tmp/ipmi_exporter-cc2225c5535b995b1eb9a53327724295 -h host1
I thought that config file will contain user/pass, but no, that file is always zero in length. What am I doing wrong?
This is working as intended. The config is not a regular file, it is a named pipe. Hence, it can only be read once, and it will be read by the exporter itself (edit: actually, by ipmi-sel, executed by the exporter) immediately after creation.
Please describe the actual problems you are observing, e.g. any errors you get in the logs or such.
The ipmi-sel process is just stuck and dies with timeout. If I use the same command I posted in previous post and add -u <user> -P the command works. I didn't realize the config file is a pipe, so it makes sense it's zero length, thank you for explaining that.
Is there an easy(ish) way to get data that was written to the config file (the pipe)? I'd like to manually run ipmi-sel with the same args and config that the exporter does and see what I get. Ideally, send the PR or somehow change the config once I realize what's wrong.
Strange. And the SEL collector the only one that is not working?
Also, can you post the error message from ipmi-sel that should show up in the exporter log?
source=collector.go:120 msg="Collector failed" name=sel-events error="error running ipmi-sel: exit status 1: ipmi-sel: connection timeout\n"
source=collector.go:120 msg="Collector failed" name=bmc error="error running bmc-info: exit status 1: bmc-info: connection timeout\n"
source=collector.go:120 msg="Collector failed" name=ipmi error="error running ipmimonitoring: exit status 1: /usr/sbin/ipmi-sensors: connection timeout\n"
Ok, this already looks quite different then. Looking at your config in the initial post, this looks a bit odd, actually. Are you running an exporter on every host you're trying to scrape? Or do you have one central exporter instance for all hosts?
I have a central exporter instance.
If we assume I have a centralized ipmi-exporter and prometheus instance on the same server, what should the config look like? The reason for centralized node is that we have JBODs with IPMI so we can't install exporter on them. Thank you!
Does your prometheus config not have a relabeling rule for setting __address__ and __param_target? Or did you just omit those?
Sorry, just my bad copy/paste. This is the whole section
- file_sd_configs:
- files:
- /etc/prometheus/ipmi-targets.yml
refresh_interval: 10m
job_name: ipmi-exporter
metrics_path: /ipmi
params:
module:
- default
relabel_configs:
- action: replace
regex: (.*)
replacement: ${1}
separator: ;
source_labels:
- __address__
target_label: __param_target
- action: replace
regex: (.*)
replacement: ${1}
separator: ;
source_labels:
- __param_target
target_label: instance
- action: replace
regex: .*
replacement: localhost:9290
separator: ;
target_label: __address__
scheme: http
scrape_interval: 5m
scrape_timeout: 5m
Ok, that looks about right then. And is the exporter running in Docker or native?
Native
Ok, then this is a bit strange, indeed. I just noticed one thing in the config you posted, but not really sure that this is the issue:
My prometheus.yml portion for this exporter is this
- file_sd_configs:
- files:
- /etc/prometheus/ipmi-targets.yml
refresh_interval: 10m
job_name: ipmi-exporter
metrics_path: /ipmi
params:
module:
- default
The last part should probably be:
params:
- module: "default"
But again, not sure this really makes a difference.
Now, given your initially posted config snippet, the config file that gets generated for FreeIPMI would look like this:
driver-type LAN_2_0
privilege-level user
username myuser
password mypass
Can you create this files with the appropriate user/password values and run the ipmi-sel command from your initial post, just replacing the value for --config-file to point to this file?
Please make sure to to run this command on the host that the exporter runs on and also as the user that the exporter runs as.
I went through the list and there are 3 types of nodes that I encountered:
- Not pingable at all (no wonder there's timeout)
- Node is working fine, but there were network glitches (or something) so it ended up in the log
- Node doesn't accept LAN_2_0
Of course, first two are to be ignored. Is there a way to specify driver-type per node or I would have to run two exporters: one for LAN_2_0 and one without it?
You have two options. Recall the
params:
- module: "default"
in the prometheus config.
- To keep your exporter config simple, split your targets in two batches in the prometheus config. For the first one, keep
module: default. For the second one, set e.g.module: otherdriver. In your exporter config, create a second moduleotherdriverwith the desired settings (see the example config) - To keep your prometheus config simple, instead of setting the module as fixed param, add a relabeling rule that sets module to the IP of the target host. This means that in your exporter config, you'll have to create a module for every target host. This may seem cumbersome, but many folks generate their exporter config anyways, due to unique passwords or other circumstances.
None of these options is "better" than the other, it really just depends on what makes more sense to you. Hope that helps.