How to verify that the promgen has been deployed correctly
Hi,Paul I follow the steps and register a new rule,but I don't konw how my new rule works in prometheus server.There are no alerts related to this rule.In addition, the configuration file promgen.rule.yml does not write relevant contents. So I would like to ask you if there is a problem with configuration or other problem.Here are my configuration.
promgen.yml
prometheus:
url: http://192.168.5.93:9090/
promtool: /usr/local/bin/promtool
rules: /etc/prometheus/promgen.rule.yml
blackbox: /etc/prometheus/blackbox.json
targets: /etc/prometheus/promgen.json
alertmanager:
url: http://192.168.5.93:9093
blacklist:
severity: ["debug", "blackhole"]
promgen.notification.email:
sender: [email protected]
promgen.notification.ikasan:
server: http://ikasan.example
promgen.notification.linenotify:
server: https://notify.example
prometheus.yml
- job_name: 'prometheus'
static_configs:
- targets: ['192.168.5.93:9090']
- job_name: 'consul'
consul_sd_configs:
- server: '192.168.5.93:8500'
services: []
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*dev.*
action: keep
- job_name: 'promgen'
file_sd_configs:
- files:
- "/etc/prometheus/promgen.json"
- job_name: 'blackbox'
metrics_path: /probe
params:
file_sd_configs:
- files:
- "/etc/prometheus/blackbox.json"
relabel_configs:
- source_labels: [__address__]
regex: (.*)(:80)?
target_label: __param_target
replacement: ${1}
- source_labels: [__param_target]
regex: (.*)
target_label: instance
replacement: ${1}
- source_labels: []
regex: .*
target_label: __address__
replacement: 192.168.5.93:9115 # Blackbox exporter.
promgen.json
[
{
"labels": {
"__farm_source": "promgen",
"__metrics_path__": "/metrics",
"__shard": "Default",
"farm": "hosts",
"job": "node-exporter",
"project": "test-project",
"service": "test-service"
},
"targets": [
"192.168.1.7:9100",
"192.168.5.93:9100"
]
}
]
Looking at your promgen.yml configuration, I see you use /etc/prometheus/promgen.rule.yml for your rule path. You should be able to check to see that it exists on your target Prometheus server.
You don't have it written in your prometheus.yml snippet, but you should also have a rule section that looks like
rule_files:
- /etc/prometheus/promgen.rule.yml
promgen.rule.yml
groups:
- name: hostStatsAlert
rules:
- alert: hostCpuUsageAlert
expr: node_load1 > 0.01
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} CPU usgae high"
description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"
- alert: hostMemUsageAlert
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} MEM usgae high"
description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"
This file exists.Now the contents of this file are written by myself.

And to confirm, you also have a rule_files section?
https://github.com/line/promgen/blob/f83e51fe57b70bb7c162b10ab126bf5c3434705d/docker/prometheus.yml#L8-L11
One more thing I would check, is that the promgen worker has permission to write the files. For example, if you created promgen.rule.yml as root, but then are running promgen as a non-root user, it would not be able to write (you should generally rune Promgen and Prometheus as non-root users)
Sorry,the full contents of the prometheus.yml file are as follows.
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
external_labels:
cluster_name: 'promgen'
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['192.168.5.93:9093']
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- /etc/prometheus/promgen.rule.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['192.168.5.93:9090']
- job_name: 'consul'
consul_sd_configs:
- server: '192.168.5.93:8500'
services: []
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*dev.*
action: keep
- job_name: 'promgen'
file_sd_configs:
- files:
- "/etc/prometheus/promgen.json"
- job_name: 'blackbox'
metrics_path: /probe
params:
file_sd_configs:
- files:
- "/etc/prometheus/blackbox.json"
relabel_configs:
- source_labels: [__address__]
regex: (.*)(:80)?
target_label: __param_target
replacement: ${1}
- source_labels: [__param_target]
regex: (.*)
target_label: instance
replacement: ${1}
- source_labels: []
regex: .*
target_label: __address__
replacement: 192.168.5.93:9115 # Blackbox exporter.
I created promgen.rule.yml as root and run Prometheus and Promgen as root too.I would like to ask you if Prometheus and Promgen must be run by non-root users?
There is no requirement for either to run as root. Typically it is better to run them as a non-root user. So for example you might create a new prometheus user on your system, and have both Promgen and Prometheus running as the prometheus user
okay,but the question I raised at the beginning still exists.In addition,another problem I found was that when I registerd a new rule, I clicked the test button and reported an error. The error information is as follows.
2020-04-01 01:18:31,246 ERROR Internal Server Error: /rule/0/test
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
response = get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python3.6/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/django/views/generic/base.py", line 71, in view
return self.dispatch(request, *args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/mixins.py", line 52, in dispatch
return super().dispatch(request, *args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/django/views/generic/base.py", line 97, in dispatch
return handler(request, *args, **kwargs)
File "/usr/src/app/promgen/views.py", line 1293, in post
result = util.get(url, {'query': query}).json()
File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
(edited to add formatting)
Do you have a shard and prometheus servers registered via the /admin/ page ?

The URL for your cluster needs to be a real URL that Promgen can query.
I think I also need to update the RuleTest to make the error more obvious
I changed the URL as shown in illustration,but the problem still hasn't been solved.
Can you also check /admin/sites/site/
Ensure the promgen domain is the same as being served
Going to work on two patches to help make this more obvious.
I changed the URL as shown in illustration,but the problem still hasn't been solved.
Hi, have you solved this problem? I encountered the same problem as you, how did you solve it?