loki
loki copied to clipboard
How configure Recording and Alerting rules with Loki
I am trying to configure Recording rule and according to documentation, it is not clear, how to set it up.
I configured rules.yml file in /loki/rules directory. According the doc Recording rules, I implement my own rule:
name: MyRules
interval: 1m
rules:
- record: generator:requests:rate2m
expr: |
sum(
rate({service="generator_generator"}[2m])
)
labels:
cluster: "something"
At first, this does not make anything, no logs in Loki about wrong format, no metrics in Prometheus (remote write). After that, I copy this file also to directory rules-temp and also to the /loki/rules/fake/ directory, based on doc Ruler storage. From the doc, I am not sure, where this file should be located so I copied it everywhere. The result was the same - no logs in Loki, nothing in Prometheus.
After day off, I started Loki and find out log:
2022-11-03T08:24:24.062210590Z level=error ts=2022-11-03T08:24:24.061854756Z caller=ruler.go:497 msg="unable to list rules" err="failed to list rule groups for user fake: failed to list rule group for user fake and namespace rules.yml: error parsing /loki/rules/fake/rules.yml: /loki/rules/fake/rules.yml: yaml: unmarshal errors:\n line 1: field name not found in type rulefmt.RuleGroups\n line 2: field interval not found in type rulefmt.RuleGroups\n line 3: field rules not found in type rulefmt.RuleGroups"
This log was not there before, even when I restart Loki, it is not there, do not understand why. But I assume, Loki cannot parse my rules file. I found out corterx-tool for validating Loki rules. After few run, I ended up with new rules.yml file:
namespace: rules
groups:
- name: MyRules
interval: 1m
rules:
- record: generator:requests:rate1m
expr: |-
sum(rate({service="generator_generator"}[2m]))
labels:
cluster: something
It is quiet different from the one in docs, but It looks like its ok:
$ cortextool rules lint --backend=loki rules.yml
INFO[0000] SUCCESS: 1 rules found, 0 linted expressions
After this small success I run Loki again but no result in Loki logs or Prometheus. I tried even set wrong prometheus remote write addres but Loki does not log anything about this error.
My current configuration of Loki ruler:
ruler:
alertmanager_url: http://localhost:9093
remote_write:
enabled: true
client:
url: http://prometheus:9090/api/v1/write
Prometheus runs in default configuration.
Versions: Loki: 2.6.1 Prometheus: v2.39.1
Questions:
- Where should be rule file located and whats the difference between
/rules,/rules-tempand/rules/<tenant-id>? - What is the format of rules and rule files? Can there be multiple files?
- Why the log about rules does not occur in Loki logs (wrong Prometheus url, wrong rules.yml format)?
- How to properly configure rules (both Recording and Alerting) in Loki? Documentation looks very unclear.
- How to debug this configuration and setup? Basically, I do not know where to check, if something is wrong with no logs or any information about it.
Thanks for any tips.
I tested recording rules with prometheus, it works without any problem. I implemented metric configuration with two numbers and rule:
groups:
- name: example
rules:
- record: job:configuration:sum
expr: sum by (job) (configuration)
New metric job:configuration:sum is in prometheus and can be used:

When I apply similar configuration to the Loki, it does not work. I tested with debug mode there is basically no logs about ruler. No warnings or errors.
After long debugging, I figure it out.
When I tried to list rules directory from container, it could not list it:
/loki/rules $ ls -al
^C
After restart Loki with new data directory mounted, it start working. I checked permissions of directories and files - it was the same. I tried the debug it deeper after my PC was turned off, and it starts working even with old directory. It looks like it was docker issue.
My findings during debug:
- endpoint
/loki/api/v1/rulesreturn information about current rules, but only errors if some exists. - when rules format is ok, endpoint
/loki/api/v1/rulesjust download rules file. - prometheus must be run with option
--web.enable-remote-write-receiver- this will enable remote write - only error log in Loki container logs occur only when Prometheus does not have enabled remote write. No other information about ruler has been founded in logs. Then, it is quiet hard to debug when something does not work. This should be improved according to me.
rules.ymlmust be located in directory with tenant ID (default isfake). So proper location of file is:/loki/rules/fake/rules.yml- This is example of
rules.ymlfile:
groups:
- name: test
interval: 1m
rules:
- record: lokiTest
expr: |
sum(rate({service="generator_generator"}[2m]))
labels:
cluster: "us-central1"
I hope this will help to somebody.
I second your confusion about the Loki ruler directories!
Ruler Documentation is very vague:
# File path to store temporary rule files.
# CLI flag: -ruler.rule-path
[rule_path: <filename> | default = "/rules"]
and then there is https://grafana.com/docs/loki/latest/rules/#ruler-storage with this example:
-ruler.storage.type=local
-ruler.storage.local.directory=/tmp/loki/rules
so it's really not clear to me how to manage the rules properly - should they get uploaded to ruler.path or ruler.storage.local.directory ?
@weakcamel This is my configuration of loki ruler section:
ruler:
wal:
dir: /loki/ruler-wal
storage:
type: local
local:
directory: /rules
alertmanager_url: http://localhost:9093
remote_write:
enabled: true
client:
url: http://<prometheus-address>:9090/api/v1/write
Section ruler.storage.local.directory defines, where will be all rules stored. All rules file (yml files) must be in directory with name of tenant they belong.
In CI pipeline, I copy rules.yml file into the container. From Dockerfile:
COPY "rules.yaml" /rules/fake/rules.yaml
The file must be in the directory named by ID of tenant name, I am using default one name is fake.
My rules.yml file contains only one rule:
groups:
- name: Loki logs disk usage
interval: 1m
rules:
- record: logs_bytes_over_time_1m
expr: |
bytes_over_time({service=~".+"}[1m])
This rule create new metric logs_bytes_over_time_1m and push it to the <prometheus-address>. Thats the reason, prometheus must be run with --web.enable-remote-write-receiver argument. Be careful, I am using prometheus version v2.29.2 and its uses feature flag --enable-feature=remote-write-receiver.
Metric logs_bytes_over_time_1m is created by sum bytes for last 1 minute per service. The reason is, Loki does not support any information how to determine, which stream takes how much space. I asked about it in Metrics of stream size stored by Loki issue, but because of cardinality, you have to implement it by yourself.
After this, I can see metrics in Prometheus. I hope this helped.
@dorinand Your answer should be posted somewhere to the public. Finally got it working thanks to your reply! My Prometheus did not have the flag enabled.
Thanks!
@dorinand Also a big thank you from me for posting this info. Helped me get alerting to work. Was driving me nuts! :-)
I have similar problem, I've been struggling for almost a month and there really is very little documentation I saw you have the same problem, can you tell me how you solved it, especially the configuration prometheus and loki I hope because this item is verry difficult for me
Hi @vadim415,
I posted solution above and it already helps few people, did you read it?
Which part does not work for you, can you elaborate little bit?
Yes, It is really helped me ,But I still have an unresolved problem with prometheus, I can't see the data that comes from promtail Can you help with them, may be I can share my config files?
Here are my two cents in addition to https://github.com/grafana/loki/issues/7589#issuecomment-1312473872's great configuration.
I configured rule_path: /tmp/ruler_rule_path. It should be a path different from the real rule folder (e.g. /rules). Without this, my /rules folder gets automagically emptied...
@fzyzcjy thank you for your message. Are you sure about this configuration? I copy my config into my Loki container during build process:
COPY "rules.yaml" /rules/fake/rules.yaml
Which means, rules are copied to the /rules directory inside the container.
Then, I run it with mounting only data directory:
services:
loki:
image: personal-dregistry/loki
command: -config.file=/etc/loki/local-config.yaml
volumes:
- '/data/loki:/loki'
From config endpoint <loki-url>/config you can see that default value for rule_path is /loki/rules-temp (I did not change it). When I run Loki, I can see in /data/loki directory which is mounted to the continer this structure:
tree rules-temp/
rules-temp/
`-- fake
`-- rules.yaml
1 directory, 1 file
Which is exactly my ruler file copied into the container stored in /rules inside the container. It looks like Loki somehow copy rules from /rules data to rule_path, which means in this example /loki/rules-temp directory. You can see it from ls -al output for both files:
-rwxr-xr-x 1 loki loki 182 Jun 16 13:26 /loki/rules-temp/fake/rules.yaml
-rw-rw-rw- 1 root root 170 Jun 16 13:24 /rules/fake/rules.yaml
I redeploy Loki on Jun 16, owner of data in rules-temp is loki.
@dorinand Thank you. I am sure I added that line and it works, while my case differs from yours in a few places, so I guess both of us are correct while in different scenarios ;)
More of my context:
- Loki 2.5.0
- Kubernetes / Helm
/rulesis read-write volume mount
Indeed, I suspect the Loki helm chart I used added some configurations without me noticing it as well.
This point is critical.
rules.yml must be located in directory with tenant ID (default is fake). So proper location of file is: /loki/rules/fake/rules.yml
I was stuck for a long time due to the issue with the single-tenant rules storage /fake path.
Can you possibly share a CI CD approach to how to update the rules? I have a repository of all the rules and I usually have to do this manually
@uncle-tee I just copy files to docker image during build. This is my Dockerfile:
COPY "${ENV}-local-config.yaml" /etc/loki/local-config.yaml
COPY "rules.yaml" /rules/fake/rules.yaml
Then, as I post before, just run container as service:
services:
loki:
image: personal-dregistry/loki
command: -config.file=/etc/loki/local-config.yaml
volumes:
- '/data/loki:/loki'
How to configure all these using the helm chart values for loki-stack, I am facing similar issue when I installed loki-stack using helm chart on my rancher cluster, somehow I am not sure what to modify and where. Not sure if yaml needs to have below section
loki:
rulerConfig:
storage:
type: local
local:
directory: /tmp/loki/rules
rule_path: /etc/loki/rules/ruler-config.yaml
alertmanager_url: http://alertmanager:9093
Even I am not sure whether the alertmanager url should be http://localhost:9093 or it should be something like http://loki-stack-prometheus-alertmanager
I connected to the Pod and saw there is only one file loki.yaml present in the directory /etc/loki
The contents of this file are as below:
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
compactor:
shared_store: filesystem
working_directory: /data/loki/boltdb-shipper-compactor
ingester:
chunk_block_size: 262144
chunk_idle_period: 3m
chunk_retain_period: 1m
lifecycler:
ring:
replication_factor: 1
max_transfer_retries: 0
wal:
dir: /data/loki/wal
limits_config:
enforce_metric_name: false
max_entries_limit_per_query: 5000
reject_old_samples: true
reject_old_samples_max_age: 168h
memberlist:
join_members:
- 'loki-stack-memberlist'
schema_config:
configs:
- from: "2020-10-24"
index:
period: 24h
prefix: index_
object_store: filesystem
schema: v11
store: boltdb-shipper
server:
grpc_listen_port: 9095
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks
table_manager:
retention_deletes_enabled: false
retention_period: 0s/etc/loki
Do I need to change these contents to include ruler? And what happens if the Pod is restarted? Do I need to setup this every time? Or can I use some ConfigMap?
Total confused as there is no documentation on setting up everything using the grafana/loki-stack helm chart.
My Grafana dashboard for Alert rules settings always show below message
Errors loading rules
Failed to load the data source configuration for Loki: Unable to fetch alert rules. Is the Loki data source properly configured?
my setup:
- loki 2.9.2
- grafana 10.1.5
for those who is getting Appender not ready error in grafana for loki alerts in GUI, this is a working solution:
loki:
...
rulerConfig:
wal:
dir: /loki/ruler-wal
storage:
type: local
local:
directory: /rules
rule_path: /rules/fake
remote_write:
enabled: true
client:
url: http://prometheus-prometheus.prometheus.svc:9090/api/v1/write
alertmanager_url: http://prometheus-alertmanager.prometheus.svc:9093
enable_api: true
ring:
kvstore:
store: inmemory
enable_alertmanager_v2: true
make sure to have set the following value:
loki.rulerConfig.wal.dir
If you install it via grafana/loki helm chart you can use the following side container to collect ConfigMaps from desired namespaces and mount them into the desired folder in the backend pods:
sidecar:
rules:
label: loki_rule
labelValue: 'true'
folder: /rules/fake
searchNamespace: ALL
resource: configmap
which is mounted into /rules/fake:
β k ice volume loki-backend-0
CONTAINER VOLUME TYPE BACKING SIZE RO MOUNT-POINT
loki-sc-rules sc-rules-volume EmptyDir - - false /rules/fake
loki-sc-rules kube-api-access-j6gkt Projected kube-root-ca.crt - true /var/run/secrets/kubernetes.io/serviceaccount
loki-sc-rules aws-iam-token Projected - - true /var/run/secrets/eks.amazonaws.com/serviceaccount
loki config ConfigMap loki - false /etc/loki/config
loki runtime-config ConfigMap loki-runtime - false /etc/loki/runtime-config
loki tmp EmptyDir - - false /tmp
loki data PersistentVolumeClaim data-loki-backend-0 - false /var/loki
loki sc-rules-volume EmptyDir - - false /rules/fake
loki kube-api-access-j6gkt Projected kube-root-ca.crt - true /var/run/secrets/kubernetes.io/serviceaccount
loki aws-iam-token Projected - - true /var/run/secrets/eks.amazonaws.com/serviceaccount
Now is 2024 and Loki's docs are still a pain in the ass. I wish the documentation could be more specific, especially for similar configurations.
How to configure all these using the helm chart values for loki-stack, I am facing similar issue when I installed loki-stack using helm chart on my rancher cluster, somehow I am not sure what to modify and where. Not sure if yaml needs to have below section
loki: rulerConfig: storage: type: local local: directory: /tmp/loki/rules rule_path: /etc/loki/rules/ruler-config.yaml alertmanager_url: http://alertmanager:9093Even I am not sure whether the alertmanager url should be
http://localhost:9093or it should be something likehttp://loki-stack-prometheus-alertmanagerI connected to the Pod and saw there is only one file
loki.yamlpresent in the directory/etc/lokiThe contents of this file are as below:
auth_enabled: false chunk_store_config: max_look_back_period: 0s compactor: shared_store: filesystem working_directory: /data/loki/boltdb-shipper-compactor ingester: chunk_block_size: 262144 chunk_idle_period: 3m chunk_retain_period: 1m lifecycler: ring: replication_factor: 1 max_transfer_retries: 0 wal: dir: /data/loki/wal limits_config: enforce_metric_name: false max_entries_limit_per_query: 5000 reject_old_samples: true reject_old_samples_max_age: 168h memberlist: join_members: - 'loki-stack-memberlist' schema_config: configs: - from: "2020-10-24" index: period: 24h prefix: index_ object_store: filesystem schema: v11 store: boltdb-shipper server: grpc_listen_port: 9095 http_listen_port: 3100 storage_config: boltdb_shipper: active_index_directory: /data/loki/boltdb-shipper-active cache_location: /data/loki/boltdb-shipper-cache cache_ttl: 24h shared_store: filesystem filesystem: directory: /data/loki/chunks table_manager: retention_deletes_enabled: false retention_period: 0s/etc/lokiDo I need to change these contents to include ruler? And what happens if the Pod is restarted? Do I need to setup this every time? Or can I use some ConfigMap?
Total confused as there is no documentation on setting up everything using the
grafana/loki-stackhelm chart.My Grafana dashboard for Alert rules settings always show below message
Errors loading rules Failed to load the data source configuration for Loki: Unable to fetch alert rules. Is the Loki data source properly configured?
Finally got solution to above issue, just posting here so that it can be helpful to others as well.
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
compactor:
shared_store: filesystem
working_directory: /data/loki/boltdb-shipper-compactor
ingester:
chunk_block_size: 262144
chunk_idle_period: 3m
chunk_retain_period: 1m
lifecycler:
ring:
replication_factor: 1
max_transfer_retries: 0
wal:
dir: /data/loki/wal
limits_config:
enforce_metric_name: false
max_entries_limit_per_query: 5000
reject_old_samples: true
reject_old_samples_max_age: 168h
memberlist:
join_members:
- 'loki-stack-memberlist'
schema_config:
configs:
- from: "2020-10-24"
index:
period: 24h
prefix: index_
object_store: filesystem
schema: v11
store: boltdb-shipper
server:
grpc_listen_port: 9095
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks
table_manager:
retention_deletes_enabled: false
retention_period: 0s
config:
ruler:
storage:
type: local
local:
directory: /etc/loki/rules
ring:
kvstore:
store: memberlist
rule_path: /tmp/loki/scratch
alertmanager_url: http://prometheus-alertmanager:9093
external_url: https://alertmanager-prod.grafana.10.10.20.245.nip.io/
enable_alertmanager_discovery: true
enable_alertmanager_v2: true
enable_api: true
- SimpleScalable: Loki is deployed as 3 targets: read, write, and backend. Useful for medium installs easier to manage than distributed, up to a about 1TB/day.
- The following applies only to the SimpleScalable deployment pattern
- Enable sidecar, which is enabled by default, and set labelValue to true. This means that loki will read configmap objects that have a loki_rule: "true" label; And loki's default tenant name is fake, so you need to set folder to /etc/loki/rules/fake in sidecar so that sidecar will write the cm object created in step 2 to /etc/loki/rules/fake/
.yaml
sidecar:
rules:
# -- Whether or not to create a sidecar to ingest rule from specific ConfigMaps and/or Secrets.
enabled: true
# -- Label that the configmaps/secrets with rules will be marked with.
label: loki_rule
# -- Label value that the configmaps/secrets with rules will be set to.
labelValue: "true" #π’π’π’π’π’π’
# -- Folder into which the rules will be placed.
folder: /etc/loki/rules/fake #π’π’π’π’π’π’
- Create a configmap and apply it
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-rule
namespace: loki
labels:
loki_rule: "true"
data:
my-alerts.yaml: |
groups:
- name: should_fire
rules:
- alert: HighPercentageError
expr: |
sum(rate({env="production"} |= "error" [5m])) by (job)
/
sum(rate({env="production"}[5m])) by (job)
> 0.05
for: 10m
labels:
severity: warning
annotations:
summary: High error rate
- Add Loki configuration information about ruler
loki:
rulerConfig:
external_url: http://<your grafana>/explore
# alertmanagerε°ε
alertmanager_url: http://<your alertmanager>:9093
ring:
kvstore:
store: inmemory
enable_alertmanager_v2: true
enable_api: true
enable_sharding: true
rule_path: /tmp/loki
storage:
type: local
local:
directory: /etc/loki/rules # π’π’π’π’π’ tenant user basedir
flush_period: 1m
- You should now find the my-alerts.yaml file under /rules in the Loki container;
- Now, in your grafana alert rules screen you can add the loki data source to see the loki alert rules;
If you can't see it, you need to go to the loki-backend component and see the error message
I am trying to have loki-backend (rather, the sidecar loki-sc-ruler) pick up ConfigMaps across ALL the namespaces.
The following config for the ruler (as seen in /etc/loki/config- loaded from the loki ConfigMap) seems to be able to get loki-sc-ruler sidecar to pick up all the ConfigMaps and place it under the /rules directory (shared with the loki-backend container).
ruler:
alertmanager_url: https://alertmanager-infra-stg-01-uswst4.kube.activision.com/
storage:
type: local
local:
directory: /rules
rule_path: /tmp/rules
ring:
kvstore:
store: inmemory
enable_api: true
However, Loki expects them to be at /rules/<tenant-id> instead. Would anyone know off the top if / how the loki-sc-ruler can place the rules in the respective tenant-id directories (I am wondering if it can be configured in the configmap somehow)?
On the contrary, the response for API /loki/api/v1/rules is unable to read rule dir /rules/fake: open /rules/fake: no such file or directory. As a test, placing a rule under the /rules/fake does work.
The docs do need a round of updates!
Hi @dorinand ,
I'm also facing similar issue,
We have deployed Loki and prometheus via HELM.
we are using S3 Storage for Ruler,
Below is the sample rules, we are passing via loki endpoint - https://loki-test.abc.com/loki/api/v1/rules/my_namespace
name: "testlogpushedrule-7thApril"
interval: 1m
rules:
- record: testlog_7thApril
expr: |
count_over_time({job="test"}[1m])
We were able to get this rules, via https://loki-test.abc.com/loki/api/v1/rules, and the same we can see in Grafana UI
Below is ruler configuration -
loki:
ruler:
storage:
type: s3
s3:
bucketnames: loki-123-test-west-test
region: us-west-2
s3forcepathstyle: true
rule_path:/loki/rules-temp
wal:
dir: /loki/ruler-wal
evaluation_interval: 2m
ring:
kvstore:
store: memberlist
enable_api: true
enable_sharding: true
remote_write:
enabled: true
config_refresh_period: 10s
add_org_id_header: true # true
clients:
- url: http://prometheus-test-server.prometheus-test.svc.cluster.local:80/api/v1/write #"https://prometheus-test.abc.com/api/v1/write"
name: "prometheus-test-AVGexecutionTime"
remote_timeout: 30s
send_exemplars: false
send_native_histograms: false
round_robin_dns: false
# write_relabel_configs:
# - source_labels: [__name__]
# target_label: source
# replacement: loki_recording_rules
# queue_config:
# capacity: 10000 # 500
# max_shards: 50 # 100
# min_shards: 1
# max_samples_per_send: 10000
# batch_send_deadline: 5s
# evaluation:
# mode: remote
# query_frontend:
# address: <QF_ADDRESS>
I'm able to see rules file loaded, under the path - /var/loki/rules-temp/fake/my_namspace. No Data at /loki/rules and /loki/rules-temp
Query -
- Iβm unable to see metric -testlog_7thApril in Prometheus. its says NODATA, which means either loki is not sending metrics or prometheus is unable to receive metrics.
- looked into all possible workaround, nothing works,
- Unfortunatly, I donβt see any logs related to remote_write either in loki ruler/prometheus/
Thanks for the support,
