beats icon indicating copy to clipboard operation
beats copied to clipboard

Err: Registry already stopped prevents harvester from starting on existing file

Open fdartayre opened this issue 2 years ago • 8 comments

Filebeat 7.16.3 with following autodiscover configuration:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      scope: node
      templates:
        - config:
            - type: container
              paths:
              - /var/log/containers/*-${data.kubernetes.container.id}.log  
              clean_inactive: 72h
              close_inactive: 15m
              close_timeout: 24h
              close_renamed: false
              close_removed: false
              clean_removed: false
              ignore_older: 48h
              harvester_buffer_size: 65536
              max_bytes: 1048576
              # End default_config
              add_resource_metadata:
                namespace:
                  enabled: true
            - type: log
              paths:
              - /var/log/messages
              clean_inactive: 72h
              close_inactive: 15m
              close_timeout: 24h
              close_renamed: false
              close_removed: false
              clean_removed: false
              ignore_older: 48h
              harvester_buffer_size: 65536
              max_bytes: 1048576

sporadically prints an error Harvester could not be started on existing file due to Err: registry already stopped:

2022-05-13T06:20:42.163Z  INFO  [input] log/input.go:171  Configured paths: [/var/log/containers/*-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log]  {"input_id": "a1036cef-c8e6-52fd-832f-81ef84ab3e45"}
2022-05-13T06:20:42.164Z  ERROR [input] log/input.go:582  Harvester could not be started on existing file: /var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log, Err: registry already stopped  {"input_id": "a1036cef-c8e6-52fd-832f-81ef84ab3e45", "source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "state_id": "native::44397250-66305", "finished": false, "os_id": "44397250-66305", "old_source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "old_finished": true, "old_os_id": "44397250-66305"}
2022-05-13T06:20:42.165Z  INFO  input/input.go:136  input ticker stopped
2022-05-13T06:20:42.206Z  INFO  [input] log/input.go:171  Configured paths: [/var/log/containers/*-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log]  {"input_id": "638df06e-d213-4732-8654-b6f9b9c69610"}
2022-05-13T06:20:42.247Z  INFO  [input] log/input.go:171  Configured paths: [/var/log/containers/*-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log]  {"input_id": "f75bb80a-3c0c-4b2c-a324-3836c4d6d590"}
2022-05-13T06:20:42.248Z  INFO  [input.harvester] log/harvester.go:309  Harvester started for file. {"input_id": "f75bb80a-3c0c-4b2c-a324-3836c4d6d590", "source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "state_id": "native::44397250-66305", "finished": false, "os_id": "44397250-66305", "old_source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "old_finished": true, "old_os_id": "44397250-66305", "harvester_id": "1ae57f03-6766-4fb0-a66a-6e0a2e4998fe"}
2022-05-13T06:20:47.066Z  INFO  input/input.go:136  input ticker stopped
2022-05-13T06:20:47.066Z  INFO  [input.harvester] log/harvester.go:336  Reader was closed. Closing. {"input_id": "f75bb80a-3c0c-4b2c-a324-3836c4d6d590", "source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "state_id": "native::44397250-66305", "finished": false, "os_id": "44397250-66305", "old_source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "old_finished": true, "old_os_id": "44397250-66305", "harvester_id": "1ae57f03-6766-4fb0-a66a-6e0a2e4998fe"}
2022-05-13T06:20:57.193Z  INFO  [input] log/input.go:171  Configured paths: [/var/log/containers/*-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log]  {"input_id": "a8f94e7e-da13-4dff-9565-8c31913207fa"}

This does not lead to data loss. @ChrsMark suggested to create a public GH issue to have someone from the team to investigate this further and see if we can improve anything most probably in the messaging/error-handling so as to avoid such confusion in the future.

fdartayre avatar May 24 '22 08:05 fdartayre

In my case the filebeat service stops after encountering these errors. I have to manually start the service. Is there any solution workaround fro this ?

bd67 avatar Jul 05 '22 06:07 bd67

I see a lot of these errors in my deployment. I suspect that this particular error causes data loss in my deployment due to Harvester's failure to monitor some files!

I have tried to play with Filebeat configuration for quite some time to fix this, without any luck!

Version

  • ECK: 2.4.0
  • Elastic Stack: 8.4.1

Error Logs

{"log.level":"error","@timestamp":"2022-10-05T20:06:07.568Z","log.logger":"input","log.origin":{"file.name":"log/input.go","file.line":583},"message":"Harvester could not be started on existing file: /var/lib/docker/containers/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2-json.log, Err: registry already stopped","service.name":"filebeat","input_id":"d13881e6-df14-458a-b5eb-ebb827fbad2b","source_file":"/var/lib/docker/containers/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2-json.log","state_id":"native::2179301-2048","finished":false,"os_id":"2179301-2048","old_source":"/var/lib/docker/containers/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2-json.log","old_finished":true,"old_os_id":"2179301-2048","ecs.version":"1.6.0"}
---
{"log.level":"error","@timestamp":"2022-10-05T19:53:48.469Z","log.logger":"input","log.origin":{"file.name":"log/input.go","file.line":558},"message":"Harvester could not be started on new file: /var/log/containers/elastic-operator-0_logging_manager-dc64f4b6972377cf0c4eb3296ead7de731dbff2a65eaf86f59e1821d435ea807.log, Err: registry already stopped","service.name":"filebeat","input_id":"8bf88eb0-3d81-4f79-b9df-4fa12104068d","source_file":"/var/log/containers/elastic-operator-0_logging_manager-dc64f4b6972377cf0c4eb3296ead7de731dbff2a65eaf86f59e1821d435ea807.log","state_id":"native::2565589-2048","finished":false,"os_id":"2565589-2048","ecs.version":"1.6.0"}

mhadidg avatar Oct 05 '22 20:10 mhadidg

We also have this issue, that sometimes (interestingly not always) when this error appears the forwarding of specific pod logs stops but logs of other pods which are processed by the same filebeat still work and are forwarded. We see in the harvester metrics of filebeat that there is a difference between open_files and running state.

boernd avatar Oct 06 '22 13:10 boernd

I'm seeing this error using filebeat 8.4.2 / k8s 1.22. It happened on a pod that ran for only 2 seconds.

{
  "log.level": "error",
  "@timestamp": "2023-02-22T00:15:03.514Z",
  "log.logger": "input",
  "log.origin": {
    "file.name": "log/input.go",
    "file.line": 558
  },
  "message": "Harvester could not be started on new file: /var/log/containers/xxxxx-cj-xxxxx-27950415-kgd2s_xxxxyyyyy-035eb9009eccf1e53bd225023373b175f8fbea147bc964b6b7482c3ca3cb533f.log, Err: registry already stopped",
  "service.name": "filebeat",
  "input_id": "f586cc7c-c1bc-43c9-beaf-066fc480fce5",
  "source_file": "/var/log/containers/xxxxx-cj-xxxxx-27950415-kgd2s_xxxxyyyyy-035eb9009eccf1e53bd225023373b175f8fbea147bc964b6b7482c3ca3cb533f.log",
  "state_id": "native::30490683-66305",
  "finished": false,
  "os_id": "30490683-66305",
  "ecs.version": "1.6.0"
}

nuno-silva avatar Feb 22 '23 11:02 nuno-silva

Hi, I am receiving the same error for filebeat version 7.13.1. Is there any fix for this issue ?

2023-03-24T00:36:37.446Z        INFO    log/input.go:157        Configured paths: [/var/log/containers/*13c60d48764517ea78ed929f8404a616d2b19e3d3e1c878e56c63ce1a2ebdbf2.log]
2023-03-24T00:36:44.090Z        ERROR   log/input.go:554        Harvester could not be started on existing file: /var/log/containers/usage-collector-parser-service-86cb7b6cc7-lq2ln_usage-collector-prod-au_usage-collector-parser-service-69dcac15057350e89a6cc13c4a5c2cd76813151bbf41e45b7e31a32137f0438b.log, Err: registry already stopped
2023-03-24T00:36:44.090Z        INFO    input/input.go:136      input ticker stopped
2023-03-24T00:36:44.090Z        ERROR   log/input.go:554        Harvester could not be started on existing file: /var/log/containers/usage-collector-parser-service-86cb7b6cc7-lq2ln_usage-collector-prod-au_istio-proxy-aa7b992875c1194f38933cd870e346c5843c0e98bbf7e36f93512d8ab4f79f52.log, Err: registry already stopped
2023-03-24T00:36:44.090Z        INFO    input/input.go:136      input ticker stopped
2023-03-24T00:36:44.090Z        INFO    input/input.go:136      input ticker stopped

rchaudhary24 avatar May 03 '23 08:05 rchaudhary24

I have same issue, using: docker.elastic.co/beats/filebeat:8.3.3

{"log":"{\"log.level\":\"error\",\"@timestamp\":\"2023-08-16T14:42:59.576Z\",\"log.logger\":\"input\",\"log.origin\":{\"file.name\":\"log/input.go\",\"file.line\":583},\"message\":\"Harvester could not be started on existing file: /var/lib/docker/containers/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8-json.log, Err: registry already stopped\",\"service.name\":\"filebeat\",\"input_id\":\"4cbb1204-bfd7-48f5-9db3-82ea899ebb36\",\"source_file\":\"/var/lib/docker/containers/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8-json.log\",\"state_id\":\"native::5767743-64768\",\"finished\":false,\"os_id\":\"5767743-64768\",\"old_source\":\"/var/lib/docker/containers/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8-json.log\",\"old_finished\":true,\"old_os_id\":\"5767743-64768\",\"ecs.version\":\"1.6.0\"}\n","stream":"stderr","time":"2023-08-16T14:42:59.576491143Z"}

It cause the logs are not shipped and lost.

@tetianakravchenko any update on that issue?

todeb avatar Aug 17 '23 12:08 todeb

Getting the exact same thing. Filebeat sporadically stops processing some pod logs, but others on the same node works well. Restarting the filebeat Daemonset solves the issue.

PatrikPM avatar May 20 '24 06:05 PatrikPM

After upgrading to Filebeat 8.11.3 this was solved for me at least.

PatrikPM avatar May 21 '24 06:05 PatrikPM