logstash-logger icon indicating copy to clipboard operation
logstash-logger copied to clipboard

Duplicate massages

Open shamil opened this issue 7 years ago • 14 comments

Hi,

As I understand from docs, duplicate messages can happen for outputs that write in batches, like Kafka. I'm using UDP output, and I still see lot's of duplicates.

Might be related, when I see duplicates I also see the following error:

ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large

What could be the reason?

shamil avatar May 11 '17 20:05 shamil

The UDP device also writes in batches because Ruby writes to sockets in batches. If you're seeing lots of duplicates, that means Ruby thought it couldn't write to the UDP socket and therefore threw an exception, and then LogStashLogger retried writing the whole batch. But in fact, the UDP packets were received on the listening side.

Errno::EMSGSIZE is a known issue with the UDP device which is documented in the troubleshooting section of the Readme

dwbutler avatar May 13 '17 09:05 dwbutler

I'm confused with your answer. First I didn't see any exceptions regarding Ruby can't write to UDP socket. Secondly, even when I enabled autoflush mode, which effectively disables batch mode, I still get duplicates. Third, according to code, the Errno::EMSGSIZE is rescued and considered as unrecoverable error, therefore ignored and not causing retries.

Am I missing something?

shamil avatar May 13 '17 11:05 shamil

Logging exceptions are written to STDERR by default. Are you checking there?

autoflush doesn't disable batching. It flushes the buffer after each log message is written. Log messages can still accumulate in the buffer if a flush takes a while. Anyway, even a single log message could be retried several times.

Errno::EMSGSIZE does indeed not cause retries. It's most likely a separate issue unrelated to the duplicate log messages.

I'll do some more research to see if it's normal/expected to get duplicate UDP packets on the receiving end even if the client only sent it once. I suspect that it is.

dwbutler avatar May 13 '17 11:05 dwbutler

Yes I'm checking STDERR. The only error I see there is Errno::EMSGSIZE.

The problem that I can get thousands duplicate messages. I mean one message can get written thousand times. I did tcpdump and indeed those duplicate messages are coming out from the ruby (rails) process.

I know Errno::EMSGSIZE should not cause duplicates, but most of the time when I see duplicates I see a huge amount of Errno::EMSGSIZE messages in STDERR

shamil avatar May 13 '17 12:05 shamil

That is indeed strange. It does appear that Errno::EMSGSIZE is not being correctly treated as unrecoverable. What version of LogStashLogger are you using? Dropping unrecoverable errors was released in 0.16.0.

dwbutler avatar May 13 '17 16:05 dwbutler

I use 0.22.1 version

shamil avatar May 13 '17 16:05 shamil

I tried a small test, and got the result I expected:

logger = LogStashLogger.new(type: :udp, port: 5228)
loop { 10.times { logger.info(Time.now); sleep(0.1) }; logger.info('1'*100_000); }
E, [2017-05-13T14:55:40.690990 #16663] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too long
W, [2017-05-13T14:55:40.691039 #16663]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write

I was not able to reproduce duplicate log messages being sent. Long messages were correctly discarded.

Can you post a few samples of the duplicate log messages?

dwbutler avatar May 13 '17 22:05 dwbutler

What version of Ruby are you using? In MRI 2.4.1, I get the error message Message too long, not Message too large

dwbutler avatar May 13 '17 22:05 dwbutler

Interesting... I just tried this test under JRuby and got E, [2017-05-13T15:12:57.713000 #17281] ERROR -- : [LogStashLogger::Device::UDP] SystemCallError - Unknown error - No message available. So it's not caught as an unrecoverable error and the whole batch is retried forever.

Looks like version-specific error handling will be required.

dwbutler avatar May 13 '17 22:05 dwbutler

I'm using MRI 2.3.3. And strangely just see the message with the error, no warning like in your output (WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write)

shamil avatar May 13 '17 22:05 shamil

I can't reproduce this behavior in MRI 2.3.3. I suspect the behavior might be platform specific. What OS are you using? I test on a Macbook Pro.

dwbutler avatar May 13 '17 22:05 dwbutler

I run it in Docker container (Alpine Linux) on top of Debian distribution.

I ran the same test and I see that the message always get retried:

E, [2017-05-13T22:49:38.210640 #13] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large
W, [2017-05-13T22:49:38.210715 #13]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write
E, [2017-05-13T22:49:38.210878 #13] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large
W, [2017-05-13T22:49:38.210946 #13]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write
E, [2017-05-13T22:49:38.211094 #13] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large
W, [2017-05-13T22:49:38.211126 #13]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write
W, [2017-05-13T22:49:42.780342 #13]  WARN -- : [LogStashLogger::Device::UDP] Buffer Full - {:pending=>50, :outgoing=>0}
W, [2017-05-13T22:49:47.804145 #13]  WARN -- : [LogStashLogger::Device::UDP] Buffer Full - {:pending=>50, :outgoing=>0}
W, [2017-05-13T22:49:52.825159 #13]  WARN -- : [LogStashLogger::Device::UDP] Buffer Full - {:pending=>50, :outgoing=>0}

shamil avatar May 13 '17 22:05 shamil

The test above writes multiple large messages. If you just write one message, you'll see that it fails and never retries again. The unrecoverable error, aborting write warning is the indicator that it will drop the message and not retry.

It looks to me like you're getting the expected behavior when running my small test, but unexpected behavior in your actual web app. Do you have any extra configuration, gems, etc. that might contribute to the issue?

dwbutler avatar May 13 '17 23:05 dwbutler

Here are the configuration I use:

config.logstash.max_message_size = 4096
config.logstash.buffer_max_items = 8192
config.logstash.buffer_max_interval = 1

Also, I'm using lograge

shamil avatar May 14 '17 00:05 shamil