logstash Loss of logs from logstash when rsyslog is restarted

Hi Team,

We are using logstash 3pp v7.15.2 in our project.When a remote syslog server restarts logstash will take 1-60 minutes to reconnect. And during this time none of the logs will be "stored", so they will just be lost.After rsyslog and logstash connection is established everything works ok, logs at the time of reconnection are lost but these should be stored in the logstash persistent queue once the rsyslog output is not available. we tried following scenarios and these are our findings:

Case 1: Deployed elastic search,logstash (configured rsyslog output but not deployed)

stats of rsyslog pipeline :

    "outputs" : [ {
      "id" : "e6ad9c62172fd69b98c84ca8fada4ab96cb2e1b4bb5bc5248eead15a4b9f4012",
      "events" : {
        "out" : 0,
        "in" : 1,
        "duration_in_millis" : 86
      },
      "name" : "syslog"
    } ]
  },
  "reloads" : {
    "last_failure_timestamp" : null,
    "successes" : 0,
    "failures" : 0,
    "last_error" : null,
    "last_success_timestamp" : null
  },
  "queue" : {
    "type" : "persisted",
    "capacity" : {
      "max_unread_events" : 0,
      "max_queue_size_in_bytes" : 1073741824,
      "queue_size_in_bytes" : 7904,
      "page_capacity_in_bytes" : 67108864
    },
    "events" : 7,
    "data" : {
      "free_space_in_bytes" : 29855367168,
      "storage_type" : "ext4",
      "path" : "/opt/logstash/data/queue/syslog"
    },
    "events_count" : 7,
    "queue_size_in_bytes" : 7904,
    "max_queue_size_in_bytes" : 1073741824

we are able to see that rsyslog pipeline persistent queue is able to store events when rsyslog is not deployed. And once rsyslog is deployed it is able to receive all the logs.

Case2: Deployed elasticsearch,logstash,rsyslog restarting both elastic search and rsyslog and sending logs

pipeline stats for elastic search :

   "name" : "elasticsearch"
    }, {
      "id" : "07aa8e0b7b6a3b03343369c6241012b28a5381ebbbb638b8ec25904c8e2f947b",
      "events" : {
        "out" : 0,
        "in" : 0,
        "duration_in_millis" : 68
      },
      "name" : "stdout"
    } ]
  },
  "reloads" : {
    "last_failure_timestamp" : null,
    "successes" : 0,
    "failures" : 0,
    "last_error" : null,
    "last_success_timestamp" : null
  },
  "queue" : {
    "type" : "persisted",
    "capacity" : {
      "max_unread_events" : 0,
      "max_queue_size_in_bytes" : 1073741824,
      "queue_size_in_bytes" : 29636,
      "page_capacity_in_bytes" : 67108864
    },
    "events" : 4,
    "data" : {
      "free_space_in_bytes" : 29845852160,
      "storage_type" : "ext4",
      "path" : "/opt/logstash/data/queue/elasticsearch"
    },
    "events_count" : 4,
    "queue_size_in_bytes" : 29636,
    "max_queue_size_in_bytes" : 1073741824
  },

pipeline stats for rsyslog :

     "name" : "syslog"
    } ]
  },
  "reloads" : {
    "last_failure_timestamp" : null,
    "successes" : 0,
    "failures" : 0,
    "last_error" : null,
    "last_success_timestamp" : null
  },
  "queue" : {
    "type" : "persisted",
    "capacity" : {
      "max_unread_events" : 0,
      "max_queue_size_in_bytes" : 1073741824,
      "queue_size_in_bytes" : 29636,
      "page_capacity_in_bytes" : 67108864
    },
    "events" : 0,
    "data" : {
      "free_space_in_bytes" : 29845852160,
      "storage_type" : "ext4",
      "path" : "/opt/logstash/data/queue/syslog"
    },
    "events_count" : 0,
    "queue_size_in_bytes" : 29636,
    "max_queue_size_in_bytes" : 1073741824
  },

In this case events are not being stored in the syslog pipeline persistent queue but are stored in elastic search pipeline persistent queue.

And once elastic search and rsyslog are restarted we are able to receive logs(that we are sending during output disconnected) only in elastic search but not in rsyslog.

Could you please share your comments on this. we think that logstash is not detecting rsyslog when it gets restarted and hence logs are not stored in persistent queue and logs are lost.

May 04 '22 13:05 Tejaswi43

Thanks for the report, this sounds like a plugin specific issue. The syslog output uses UDP by default and seems to be opinionated about re-trying exceptions when a write on the UDP socket fails.

Thus if protocol => isn't set on the output (or is set to "udp") you should change it to "tcp" so that socket write exceptions are retried.

May 05 '22 10:05 kares

We have configured protocol as tcp only. syslog { host => {{ .host | quote }} port => {{ .port }} protocol => tcp } we are getting exemptions as below [2022-04-28T01:13:34.034Z][WARN ][logstash.outputs.syslog ] syslog tcp output exception: closing, reconnecting and resending event {:host=>"rsyslog", :port=>514, :exception=>#<Errno::ECONNREFUSED: Connection refused - connect(2) for "rsyslog" port 514>, :backtrace=>["org/jruby/ext/socket/RubyTCPSocket.java:134:in initialize'", "org/jruby/RubyIO.java:876:in new'", "/opt/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5.E001/lib/logstash/outputs/syslog.rb:219:in connect'", "/opt/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5.E001/lib/logstash/outputs/syslog.rb:187:in publish'", "/opt/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-plain-3.1.0/lib/logstash/codecs/plain.rb:59:in encode'", "/opt/logstash/logstash-core/lib/logstash/codecs/delegator.rb:48:in block in encode'", "org/logstash/instrument/metrics/AbstractSimpleMetricExt.java:65:in time'", "org/logstash/instrument/metrics/AbstractNamespacedMetricExt.java:64:in time'", "/opt/logstash/logstash-core/lib/logstash/codecs/delegator.rb:47:in encode'", "/opt/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5.E001/lib/logstash/outputs/syslog.rb:147:in receive'", "/opt/logstash/logstash-core/lib/logstash/outputs/base.rb:105:in block in multi_receive'", "org/jruby/RubyArray.java:1820:in each'", "/opt/logstash/logstash-core/lib/logstash/outputs/base.rb:105:in multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:143:in multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in multi_receive'", "/opt/logstash/logstash-core/lib/logstash/java_pipeline.rb:295:in block in start_workers'"], :event=>#LogStash::Event:0x7b9ad2bd}

May 06 '22 05:05 Tejaswi43

:exception=>#<Errno::ECONNREFUSED: Connection refused - connect(2) for "rsyslog" port 514>

those exceptions are going to be retried by the output until a connection is possible - the event in that case is not ack-ed for the PQ and is expected to stay in the queue.

In this case events are not being stored in the syslog pipeline persistent queue but are stored in elastic search pipeline persistent queue.

why where these separate pipelines mentioned together, you're using a single output and directing the same data to multiple pipelines (using a pipeline output) ?

May 10 '22 06:05 kares

those exceptions are going to be retried by the output until a connection is possible - the event in that case is not ack-ed for the PQ and is expected to stay in the queue.

Yes as output is being retried the events are expected to store in queue, but they aren't being stored and due to this reason we are not able get these events after reconnection of rsyslog.

you're using a single output and directing the same data to multiple pipelines (using a pipeline output) ?

yes actually i was just comparing by configuring two output pipelines elastic search and syslog and found that logs at the time of restart are reaching only to elastic search but not for rsyslog after reconnection.

May 12 '22 13:05 Tejaswi43

@kares any other info needed? I can see label as needed-info.

Jun 01 '22 13:06 Tejaswi43

@kares Is there any progress regarding this issue?

Aug 22 '22 09:08 Tejaswi43

@kares any update regarding the issue?

Sep 07 '22 05:09 Tejaswi43

logstash logstash copied to clipboard

Loss of logs from logstash when rsyslog is restarted

logstash
logstash copied to clipboard