loki
loki copied to clipboard
Crashlookbackoff with "recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference""
Describe the bug Service runs for a day or two later it goes on to crashloopbackoff. It fails with following logs:
2021-01-27 18:54:42.026761 I | proto: duplicate proto type registered: ingester.Series
level=info ts=2021-01-27T18:54:42.030318392Z caller=main.go:130 msg="Starting Loki" version="(version=k41-fd619b6, branch=k41, revision=fd619b69)"
level=info ts=2021-01-27T18:54:42.030614885Z caller=server.go:229 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2021-01-27T18:54:42.03103619Z caller=table_manager.go:241 msg="loading table index_18652"
level=error ts=2021-01-27T18:54:42.031471165Z caller=log.go:150 msg="error running loki" err="recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference\nerror creating index client\ngithub.com/cortexproject/cortex/pkg/chunk/storage.NewStore\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/storage/factory.go:179\ngithub.com/grafana/loki/pkg/loki.(*Loki).initStore\n\t/src/loki/pkg/loki/modules.go:308\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:103\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:75\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:209\nmain.main\n\t/src/loki/cmd/loki/main.go:132\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374\nerror initialising module: store\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:105\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:75\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:209\nmain.main\n\t/src/loki/cmd/loki/main.go:132\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374"
To Reproduce Don't know, I see different errors every week. Each time I clean up the disk and use a fresh installation, loki runs fine for a day or two. Config is same as shared here: https://github.com/grafana/loki/issues/3219
Expected behavior It should come up as usual. If I clean up the disk it comes up fine.
Environment: Infrastructure: kubernetes Deployment tool: helm Am I doing wrong configuration?
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
I experience the same problem right now on a system which was running for a few weeks now.
Same issue here. We are using Loki on Azure AKS. The storageClass is azurefile.
Can you share the index file causing the error please ?
Same issue here. Running loki on ceph for 30+ days without crashes before this one. Redeployed and the error's gone. Don't know what's a possible cause.
Same over here...
This should be fixed in the upcoming 2.3 release. Please let us know it if persists.
Same here !
Experienced the same issue when running out of disk space.
Tried to run with -log.level=debug
in order to get the name of the exact corrupted file, unfortunately it add no extra logs.
@owen-d Could you please provide with more details, such as when 2.3 should be out, or what is the PR/Commit that fixes it?
Ran into the same issue running out of disk space. I resized the volume and removed and re-deployed the Helm install and it's still crashing. I attached a busybox container to the volume and validated it had additional disk space. Mine fails with the following log lines, is there a file I can remove to restart the process? Or should I attach an additional volume to have tiered storage?
level=info ts=2021-08-05T18:44:20.92045892Z caller=table_manager.go:241 msg="loading table index_18831"
level=error ts=2021-08-05T18:44:20.921204424Z caller=log.go:106 msg="error running loki" err="recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference\nerror creating index client\ngithub.com/cortexproject/cortex/pkg/chunk/storage.NewStore\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/storage/factory.go:179\ngithub.com/grafana/loki/pkg/loki.(*Loki).initStore\n\t/src/loki/pkg/loki/modules.go:319\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:103\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:75\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:220\nmain.main\n\t/src/loki/cmd/loki/main.go:132\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374\nerror initialising module: store\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:105\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:75\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:220\nmain.main\n\t/src/loki/cmd/loki/main.go:132\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374"
@dpiessens This is not related to disk size IMO
Hi @owen-d , was getting same errors when i resized volume and restarted pod in AKS(loki version 2.2.0). Then upgraded to 2.3.0 but now new errors appeared loki_error.log .
any update here? have the same issue with the same setup worked for some time being and crashing now. raised an incident https://github.com/grafana/helm-charts/issues/1730 for the same. anyone able to get over this issue?
I'm getting the same issue since a few days using: grafana/loki:2.6.1
Here is my config:
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
compactor:
retention_delete_delay: 2h
retention_delete_worker_count: 150
retention_enabled: true
shared_store: filesystem
working_directory: /data/loki/boltdb-shipper-compactor
ingester:
chunk_block_size: 262144
chunk_idle_period: 3m
chunk_retain_period: 1m
lifecycler:
ring:
replication_factor: 1
max_transfer_retries: 0
wal:
dir: /data/loki/wal
limits_config:
enforce_metric_name: false
max_entries_limit_per_query: 5000
per_stream_rate_limit: 8M
per_stream_rate_limit_burst: 16M
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 168h
memberlist:
join_members:
- 'loki-stack-memberlist'
schema_config:
configs:
- from: "2020-10-24"
index:
period: 24h
prefix: index_
object_store: filesystem
schema: v11
store: boltdb-shipper
server:
grpc_listen_port: 9095
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks
table_manager:
retention_deletes_enabled: false
retention_period: 0s
Here is the stack trace:
level=error ts=2022-08-30T07:00:51.123943705Z caller=index_set.go:265 table-name=index_19233 msg="sync failed, retrying it" err=timeout
level=error ts=2022-08-30T07:00:51.124066597Z caller=index_set.go:104 table-name=index_19233 msg="failed to initialize table index_19233, cleaning it up" err=timeout
timeout
error creating index client
github.com/grafana/loki/pkg/storage.(*store).storeForPeriod
/src/loki/pkg/storage/store.go:245
github.com/grafana/loki/pkg/storage.(*store).init
/src/loki/pkg/storage/store.go:162
github.com/grafana/loki/pkg/storage.NewStore
/src/loki/pkg/storage/store.go:145
github.com/grafana/loki/pkg/loki.(*Loki).initStore
/src/loki/pkg/loki/modules.go:522
github.com/grafana/dskit/modules.(*Manager).initModule
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:120
github.com/grafana/dskit/modules.(*Manager).InitModuleServices
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92
github.com/grafana/loki/pkg/loki.(*Loki).Run
/src/loki/pkg/loki/loki.go:341
main.main
/src/loki/cmd/loki/main.go:105
runtime.main
/usr/local/go/src/runtime/proc.go:255
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1581
error initialising module: store
github.com/grafana/dskit/modules.(*Manager).initModule
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:122
github.com/grafana/dskit/modules.(*Manager).InitModuleServices
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92
github.com/grafana/loki/pkg/loki.(*Loki).Run
/src/loki/pkg/loki/loki.go:341
main.main
/src/loki/cmd/loki/main.go:105
runtime.main
/usr/local/go/src/runtime/proc.go:255
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1581
level=error ts=2022-08-30T07:00:51.124647905Z caller=log.go:103 msg="error running loki" err="timeout\nerror creating index client\ngithub.com/grafana/loki/pkg/storage.(*store).storeForPeriod\n\t/src/loki/pkg/storage/store.go:245\ngithub.com/grafana/loki/pkg/storage.(*store).init\n\t/src/loki/pkg/storage/store.go:162\ngithub.com/grafana/loki/pkg/storage.NewStore\n\t/src/loki/pkg/storage/store.go:145\ngithub.com/grafana/loki/pkg/loki.(*Loki).initStore\n\t/src/loki/pkg/loki/modules.go:522\ngithub.com/grafana/dskit/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:120\ngithub.com/grafana/dskit/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:341\nmain.main\n\t/src/loki/cmd/loki/main.go:105\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\nerror initialising module: store\ngithub.com/grafana/dskit/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:122\ngithub.com/grafana/dskit/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:341\nmain.main\n\t/src/loki/cmd/loki/main.go:105\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581"
I am getting the same issue using grafana/loki-stack:2.8.2
here is the log error:
I am getting the same issue
level=error ts=2022-09-15T03:54:52.007356924Z caller=index_set.go:265 table-name=index_19249 msg="sync failed, retrying it" err="recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference"
level=error ts=2022-09-15T03:54:52.007405845Z caller=index_set.go:104 table-name=index_19249 msg="failed to initialize table index_19249, cleaning it up" err="recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference"
recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference
error creating index client
Hi, any progress on this? The same error here with chart loki-simple-scalable-1.8.11 :(
Same issue here on loki chart loki-3.3.1
Same issue here on Loki 2.4.2. Any workaround?
Loki's pod falls off after a while
$ kubectl version -o yaml
clientVersion:
buildDate: "2022-05-03T13:46:05Z"
compiler: gc
gitCommit: 4ce5a8954017644c5420bae81d72b09b735c21f0
gitTreeState: clean
gitVersion: v1.24.0
goVersion: go1.18.1
major: "1"
minor: "24"
platform: linux/amd64
kustomizeVersion: v4.5.4
serverVersion:
buildDate: "2022-08-17T18:47:37Z"
compiler: gc
gitCommit: 95ee5ab382d64cfe6c28967f36b53970b8374491
gitTreeState: clean
gitVersion: v1.24.4
goVersion: go1.18.5
major: "1"
minor: "24"
platform: linux/amd64
$ helm version
version.BuildInfo{Version:"v3.10.1", GitCommit:"9f88ccb6aee40b9a0535fcc7efea6055e1ef72c9", GitTreeState:"clean", GoVersion:"go1.18.7"}
I added a Helm chart from the documentation https://grafana.com/docs/loki/latest/installation/helm/install-scalable/ and https://github.com/grafana/helm-charts
$ helm repo add grafana https://grafana.github.io/helm-charts
$ helm repo update
changed values
loki:
persistence:
enabled: true
storageClassName: nfs-client
size: 30Gi
promtail:
tolerations:
- effect: NoExecute
key: node-role.kubernetes.io/etcd
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/controlplane
operator: Exists
installation
$ helm install loki-stack grafana/loki-stack -n monitoring -f loki-values.yaml
status
$ helm ls
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
loki-stack monitoring 2 2023-01-10 10:59:07.569498213 +0500 +05 deployed loki-stack-2.8.9 v2.6.1
pod is working
$ k get pod
NAME READY STATUS RESTARTS AGE
loki-stack-0 1/1 Running 0 59m
loki-stack-promtail-68jnv 1/1 Running 0 55m
loki-stack-promtail-97c72 1/1 Running 0 55m
loki-stack-promtail-kp9g8 1/1 Running 0 67m
loki-stack-promtail-krzv6 1/1 Running 0 67m
loki-stack-promtail-xzrfg 1/1 Running 0 67m
$ k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
storage-loki-stack-0 Bound pvc-a9daec92-b541-47a9-b59f-72996377ab02 30Gi RWO nfs-client 59m
$ k exec -it loki-stack-0 -- sh
$ df -h
Filesystem Size Used Available Use% Mounted on
overlay 97.3G 17.1G 76.1G 18% /
tmpfs 64.0M 0 64.0M 0% /dev
tmpfs 5.8G 0 5.8G 0% /sys/fs/cgroup
/dev/mapper/ubuntu--vg-ubuntu--lv
97.3G 17.1G 76.1G 18% /tmp
172.30.44.157:/mnt/nfs_k8s_dynamic/monitoring-storage-loki-stack-0-pvc-a9daec92-b541-47a9-b59f-72996377ab02
195.8G 5.1G 182.6G 3% /data
tmpfs 11.6G 4.0K 11.6G 0% /etc/loki
/dev/mapper/ubuntu--vg-ubuntu--lv
97.3G 17.1G 76.1G 18% /dev/termination-log
/dev/mapper/ubuntu--vg-ubuntu--lv
97.3G 17.1G 76.1G 18% /etc/resolv.conf
/dev/mapper/ubuntu--vg-ubuntu--lv
97.3G 17.1G 76.1G 18% /etc/hostname
/dev/mapper/ubuntu--vg-ubuntu--lv
97.3G 17.1G 76.1G 18% /etc/hosts
shm 64.0M 0 64.0M 0% /dev/shm
tmpfs 11.6G 12.0K 11.6G 0% /run/secrets/kubernetes.io/serviceaccount
tmpfs 5.8G 0 5.8G 0% /proc/acpi
tmpfs 64.0M 0 64.0M 0% /proc/kcore
tmpfs 64.0M 0 64.0M 0% /proc/keys
tmpfs 64.0M 0 64.0M 0% /proc/timer_list
tmpfs 64.0M 0 64.0M 0% /proc/sched_debug
tmpfs 5.8G 0 5.8G 0% /proc/scsi
tmpfs 5.8G 0 5.8G 0% /sys/firmware
after a while
$ k describe pod loki-stack-0
Name: loki-stack-0
Namespace: monitoring
Priority: 0
Node: ds-worker1/172.30.44.165
Start Time: Tue, 10 Jan 2023 14:51:12 +0500
Labels: app=loki
controller-revision-hash=loki-stack-65cb46884c
name=loki-stack
release=loki-stack
statefulset.kubernetes.io/pod-name=loki-stack-0
Annotations: checksum/config: b7426dc30bbac01e54c43af7d1e183312a6d75e0203d448892a9ff0a3cd46f6e
kubectl.kubernetes.io/restartedAt: 2023-01-10T14:51:09+05:00
prometheus.io/port: http-metrics
prometheus.io/scrape: true
Status: Running
IP: 10.42.3.237
IPs:
IP: 10.42.3.237
Controlled By: StatefulSet/loki-stack
Containers:
loki:
Container ID: docker://c24efa4c9b21ba76cb26151da17dd7a7d48ee3e508452b97b5f2847b5c4e0dd7
Image: grafana/loki:2.6.1
Image ID: docker-pullable://grafana/loki@sha256:1ee60f980950b00e505bd564b40f720132a0653b110e993043bb5940673d060a
Ports: 3100/TCP, 9095/TCP, 7946/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
-config.file=/etc/loki/loki.yaml
State: Running
Started: Tue, 10 Jan 2023 16:12:23 +0500
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 10 Jan 2023 16:11:23 +0500
Finished: Tue, 10 Jan 2023 16:12:22 +0500
Ready: True
Restart Count: 2
Liveness: http-get http://:http-metrics/ready delay=45s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http-metrics/ready delay=45s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage (rw)
/etc/loki from config (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2xnh9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: storage-loki-stack-0
ReadOnly: false
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
config:
Type: Secret (a volume populated by a Secret)
SecretName: loki-stack
Optional: false
kube-api-access-2xnh9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 43m kubelet Readiness probe failed: Get "http://10.42.3.237:3100/ready": dial tcp 10.42.3.237:3100: connect: connection refused
Warning Unhealthy 43m kubelet Liveness probe failed: Get "http://10.42.3.237:3100/ready": dial tcp 10.42.3.237:3100: connect: connection refused
Normal Killing 42m kubelet Container loki failed liveness probe, will be restarted
Normal Started 42m (x3 over 123m) kubelet Started container loki
Normal Pulled 42m (x3 over 123m) kubelet Container image "grafana/loki:2.6.1" already present on machine
Normal Created 42m (x3 over 123m) kubelet Created container loki
Warning Unhealthy 41m (x7 over 122m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 41m (x6 over 122m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 503
$ k logs loki-stack-0
level=error ts=2023-01-10T11:52:45.171326574Z caller=table.go:152 table-name=index_19367 msg="failed to remove working directory /data/loki/boltdb-shipper-compactor/index_19367" err="unlinkat /data/loki/boltdb-shipper-compactor/index_19367/.nfs00000000000a088e00003192: device or resource busy"
level=error ts=2023-01-10T11:52:45.171437058Z caller=compactor.go:492 msg="failed to compact files" table=index_19367 err="recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference"
level=error ts=2023-01-10T11:52:45.171593177Z caller=compactor.go:432 msg="failed to run compaction" err="recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference"
@S0b1t
$ k logs loki-stack-0
level=error ts=2023-01-10T11:52:45.171326574Z caller=table.go:152 table-name=index_19367 msg="failed to remove working directory /data/loki/boltdb-shipper-compactor/index_19367" err="unlinkat /data/loki/boltdb-shipper-compactor/index_19367/.nfs00000000000a088e00003192: device or resource busy" level=error ts=2023-01-10T11:52:45.171437058Z caller=compactor.go:492 msg="failed to compact files" table=index_19367 err="recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference" level=error ts=2023-01-10T11:52:45.171593177Z caller=compactor.go:432 msg="failed to run compaction" err="recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference"
This suggests that loki is trying to delete a file which is still opened by loki itself. Also, what these logs suggest is that loki is trying to access the ".nfs000xxxx" files, which seems weird. Loki shouldn't access any of these files at any given point, as they are just a "reminder" that the files need to be deleted, once the process ends that opened that file. So I would suggest that Loki doesn't read any files that have a dot in front of them or start with ".nfs", if finding the place where the file handle is being leaked isn't an easy option.
The .nfs0000xxxx files are created by the nfs-client when it tries to delete a file which is opened by another process. Since Loki runs as StatefulSet within k8s, it has 1 volume per pod. So the only process who could keep the file open is loki itself.
Seems like the files in loki are not closed properly, before trying to run the delete operation. We have been using NFS as a storage for other services with out any issues in the past: 2 years of running Postgresql database, mongodb and rabbitmq persistance on it without ever having issues with corruption or data los until we tried loki.
My knowledge of the structure of loki is zero as is my knowledge of go, otherwise I'd dig deeper into the source and try to find the exact issue. But from error messages I received it suggests that the error may lie in the storage.go in the storeForPeriod
method or somewhere in log.go file
level=error ts=2023-02-09T16:39:55.043613981Z caller=index_set.go:285 table-name=loki_index_19395 msg="sync failed, retrying it" err=timeout
level=error ts=2023-02-09T16:39:55.043650239Z caller=index_set.go:103 table-name=loki_index_19395 msg="failed to initialize table loki_index_19395, cleaning it up" err=timeout
level=error ts=2023-02-09T16:39:55.04378379Z caller=log.go:171 msg="error running loki" err="timeout\nerror creating index client\ngithub.com/grafana/loki/pkg/storage.(*store).storeForPeriod\n\t/src/loki/pkg/storage/store.go:270\ngithub.com/grafana/loki/pkg/storage.(*store).init\n\t/src/loki/pkg/storage/store.go:164\ngithub.com/grafana/loki/pkg/storage.NewStore\n\t/src/loki/pkg/storage/store.go:147\ngithub.com/grafana/loki/pkg/loki.(*Loki).initStore\n\t/src/loki/pkg/loki/modules.go:635\ngithub.com/grafana/dskit/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:120\ngithub.com/grafana/dskit/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:449\nmain.main\n\t/src/loki/cmd/loki/main.go:110\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\nerror initialising module: store\ngithub.com/grafana/dskit/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:122\ngithub.com/grafana/dskit/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:449\nmain.main\n\t/src/loki/cmd/loki/main.go:110\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"
Could not flush logger <nil>
Resources on this topic:
- https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/What_are_nfsXXXX_files_and_how_do_I_delete_them
- https://serverfault.com/questions/201294/nfsxxxx-files-appearing-what-are-those
@TsengSR Thanks for the answer ! After reading these topics, I'm starting to understand the situation.
Resources on this topic:
- https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/What_are_nfsXXXX_files_and_how_do_I_delete_them
- https://serverfault.com/questions/201294/nfsxxxx-files-appearing-what-are-those
By the way, I already set up the loki stack in another way, installed promtail using helm chart and set up loki on another ec2 instance on the host.
I'm new in kubernetes, so after running a few microservices I started having other errors, like pod don't run, insufficient resources.
After reading this Pod don't run, insufficient resources, I figured something out, on my NFS instance has 200 GB of storage of which 190 GB was given to other microservices. Then I created 30 GB pvc for loki stack. Hmm. Perhaps that's why "device or resource busy" in the log, but this is just a guess.
I have another k8s cluster, I will try the PLG stack with NFS again.
In our case its not the disk space,, Loki been acting like this since I tried to install it in the first place 3 month ago (We're at 60 GB out of 170 or so with 10 Gi for Loki). In the firs attempts we used loki configured with filesystem, later against s3 (using MinIO). It's persisting properly in MinIO, but the PVs used for write-ahead-log (wal) and cache are still on NFS and having the same issues with the .nfs000 files.
I have moved to loki distributed and since then everything is working without any issue.
@darox
I have moved to loki distributed and since then everything is working without any issue.
Mind posting your values yaml and helm chart/version used? I've tried the bitnami helm charts, the distributed chart (based on 2.x helm charts) and the new 3.x unified helm chart. In all of them, the queriers/readers crash after some time
I experience the same as @TsengSR. I feel like the real issue is that loki tries to read these .nfs00...
files and obviously cannot process them.
I am facing the same issue even today with Loki version 2.6.1, and any good work around is still not available. After surfing through multiple git issue pages, it seems that It is an issue with Loki version 2.X. Can anyone confirm this. And, Please share if you have found a great solution to the issue
it seems that It is an issue with Loki version 2.X. Can anyone confirm this.
I think so too
it seems that It is an issue with Loki version 2.X. Can anyone confirm this.
I think so too
Now do you have a better way to solve it?
In my case, I have deployed loki with Bitnami helm chart.
I just deleted the cache at this location: /bitnami/grafana-loki/loki/cache in the loki querier pods
And it works fine.