promgen How to verify that the promgen has been deployed correctly

Hi,Paul I follow the steps and register a new rule,but I don't konw how my new rule works in prometheus server.There are no alerts related to this rule.In addition, the configuration file promgen.rule.yml does not write relevant contents. So I would like to ask you if there is a problem with configuration or other problem.Here are my configuration.

promgen.yml

prometheus:
  url: http://192.168.5.93:9090/
  promtool: /usr/local/bin/promtool
  rules: /etc/prometheus/promgen.rule.yml
  blackbox: /etc/prometheus/blackbox.json
  targets: /etc/prometheus/promgen.json

alertmanager:
  url: http://192.168.5.93:9093
  blacklist:
    severity: ["debug", "blackhole"]

promgen.notification.email:
  sender: [email protected]
promgen.notification.ikasan:
  server: http://ikasan.example
promgen.notification.linenotify:
  server: https://notify.example

prometheus.yml

- job_name: 'prometheus'
   static_configs:
   - targets: ['192.168.5.93:9090']
 - job_name: 'consul'
   consul_sd_configs:
     - server: '192.168.5.93:8500'
       services: []
   relabel_configs:
     - source_labels: [__meta_consul_tags]
       regex: .*dev.*
       action: keep
 - job_name: 'promgen'
   file_sd_configs:
   - files:
     - "/etc/prometheus/promgen.json"
 - job_name: 'blackbox'
   metrics_path: /probe
   params:
   file_sd_configs:
   - files:
     - "/etc/prometheus/blackbox.json"
   relabel_configs:
     - source_labels: [__address__]
       regex: (.*)(:80)?
       target_label: __param_target
       replacement: ${1}
     - source_labels: [__param_target]
       regex: (.*)
       target_label: instance
       replacement: ${1}
     - source_labels: []
       regex: .*
       target_label: __address__
       replacement: 192.168.5.93:9115  # Blackbox exporter.

promgen.json

[
  { 
    "labels": {
        "__farm_source": "promgen",
        "__metrics_path__": "/metrics",
        "__shard": "Default",
        "farm": "hosts",
        "job": "node-exporter",
        "project": "test-project",
        "service": "test-service"
    },
    "targets": [ 
        "192.168.1.7:9100",
        "192.168.5.93:9100"
    ]
  }
]

Mar 31 '20 08:03 zhenghanyin

Looking at your promgen.yml configuration, I see you use /etc/prometheus/promgen.rule.yml for your rule path. You should be able to check to see that it exists on your target Prometheus server.

You don't have it written in your prometheus.yml snippet, but you should also have a rule section that looks like

rule_files:
   - /etc/prometheus/promgen.rule.yml

Mar 31 '20 09:03 kfdm

promgen.rule.yml

groups:
- name: hostStatsAlert
  rules:
  - alert: hostCpuUsageAlert
    expr: node_load1  > 0.01
    for: 1m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} CPU usgae high"
      description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"
  - alert: hostMemUsageAlert
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85
    for: 1m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} MEM usgae high"
      description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"

This file exists.Now the contents of this file are written by myself.

Mar 31 '20 09:03 zhenghanyin

And to confirm, you also have a rule_files section?

https://github.com/line/promgen/blob/f83e51fe57b70bb7c162b10ab126bf5c3434705d/docker/prometheus.yml#L8-L11

Mar 31 '20 09:03 kfdm

One more thing I would check, is that the promgen worker has permission to write the files. For example, if you created promgen.rule.yml as root, but then are running promgen as a non-root user, it would not be able to write (you should generally rune Promgen and Prometheus as non-root users)

Mar 31 '20 09:03 kfdm

Sorry,the full contents of the prometheus.yml file are as follows.

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
  
  external_labels:
    cluster_name: 'promgen'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['192.168.5.93:9093']
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - /etc/prometheus/promgen.rule.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['192.168.5.93:9090']
  - job_name: 'consul'
    consul_sd_configs:
      - server: '192.168.5.93:8500'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: .*dev.*
        action: keep
  - job_name: 'promgen'
    file_sd_configs:
    - files:
      - "/etc/prometheus/promgen.json"
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
    file_sd_configs:
    - files:
      - "/etc/prometheus/blackbox.json"
    relabel_configs:
      - source_labels: [__address__]
        regex: (.*)(:80)?
        target_label: __param_target
        replacement: ${1}
      - source_labels: [__param_target]
        regex: (.*)
        target_label: instance
        replacement: ${1}
      - source_labels: []
        regex: .*
        target_label: __address__
        replacement: 192.168.5.93:9115  # Blackbox exporter.

Mar 31 '20 09:03 zhenghanyin

I created promgen.rule.yml as root and run Prometheus and Promgen as root too.I would like to ask you if Prometheus and Promgen must be run by non-root users？

Apr 01 '20 05:04 zhenghanyin

There is no requirement for either to run as root. Typically it is better to run them as a non-root user. So for example you might create a new prometheus user on your system, and have both Promgen and Prometheus running as the prometheus user

Apr 01 '20 05:04 kfdm

okay,but the question I raised at the beginning still exists.In addition,another problem I found was that when I registerd a new rule, I clicked the test button and reported an error. The error information is as follows.

2020-04-01 01:18:31,246 ERROR Internal Server Error: /rule/0/test
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.6/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/django/views/generic/base.py", line 71, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/mixins.py", line 52, in dispatch
    return super().dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/django/views/generic/base.py", line 97, in dispatch
    return handler(request, *args, **kwargs)
  File "/usr/src/app/promgen/views.py", line 1293, in post
    result = util.get(url, {'query': query}).json()
  File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

(edited to add formatting)

Apr 01 '20 06:04 zhenghanyin

Do you have a shard and prometheus servers registered via the /admin/ page ?

Apr 01 '20 06:04 kfdm

Apr 01 '20 06:04 zhenghanyin

The URL for your cluster needs to be a real URL that Promgen can query.

I think I also need to update the RuleTest to make the error more obvious

Apr 01 '20 06:04 kfdm

I changed the URL as shown in illustration，but the problem still hasn't been solved.

Apr 01 '20 08:04 zhenghanyin

Can you also check /admin/sites/site/ Ensure the promgen domain is the same as being served Going to work on two patches to help make this more obvious.

Apr 03 '20 03:04 kfdm

I changed the URL as shown in illustration，but the problem still hasn't been solved.

Hi, have you solved this problem? I encountered the same problem as you, how did you solve it?

Jan 15 '21 03:01 liuzh-sa