beats icon indicating copy to clipboard operation
beats copied to clipboard

Filebeat stops working with error message UDP Package size to large

Open oerli opened this issue 2 years ago • 7 comments

Hello

I would like to report an issue with filebeat running on Windows with an UDP input configured.

  • Version: 7.13.2
  • Operating System: Windows 2019 (1809)
  • Discuss Forum URL: https://discuss.elastic.co/t/filebeat-stops-working-with-error-message-udp-package-size-to-large/278151/3
  • Steps to Reproduce: We have the Application VMWare Air Watch Cloud Connector (19.12) configured for sending Syslog data to the local installed filebeat. It does work for some time, but then filebeat shows up with following error message, and stops sending any data to elasticsearch: filebeat.log filebeat.yml.txt

2021-07-07T18:55:06.233+0200 ERROR [udp] dgram/handler.go:77 Error reading from the socket read udp 127.0.0.1:9514: wsarecvfrom: A message sent on a datagram socket was larger than the internal message buffer or some other network limit, or the buffer used to receive a datagram into was smaller than the datagram itself. {"address": "localhost:9514"} 2021-07-07T18:55:06.303+0200 ERROR [UDP] logp/logger.go:218 Panic handling datagram. Recovering, but please report this.{panic 25 0 runtime error: invalid memory address or nil pointer dereference} {stack 15 0 github.com/elastic/beats/v7/libbeat/logp.(*Logger).Recover /go/src/github.com/elastic/beats/libbeat/logp/logger.go:218 runtime.gopanic /usr/local/go/src/runtime/panic.go:969 runtime.panicmem /usr/local/go/src/runtime/panic.go:212 runtime.sigpanic /usr/local/go/src/runtime/signal_windows.go:246 github.com/elastic/beats/v7/filebeat/input/udp.NewInput.func1 /go/src/github.com/elastic/beats/filebeat/input/udp/input.go:77 github.com/elastic/beats/v7/filebeat/inputsource/common/dgram.DatagramReaderFactory.func1.1 /go/src/github.com/elastic/beats/filebeat/inputsource/common/dgram/handler.go:82 github.com/elastic/beats/v7/filebeat/inputsource/common/dgram.(*Listener).connectAndRun /go/src/github.com/elastic/beats/filebeat/inputsource/common/dgram/server.go:122 github.com/elastic/beats/v7/filebeat/inputsource/common/dgram.(*Listener).Start.func1 /go/src/github.com/elastic/beats/filebeat/inputsource/common/dgram/server.go:112 github.com/elastic/go-concert/unison.(*TaskGroup).Go.func1 /go/pkg/mod/github.com/elastic/[email protected]/unison/taskgroup.go:124 <nil>} I hope I could give you as much Information needed but I'm happy to provide any further information, if you need more information.

I have removed the password in the configuration file as well the domain name of the servers in the log file.

Kind Regards Roland

oerli avatar Jul 09 '21 11:07 oerli

@narph @adriansr @andrewkroh I'm pinging you directly since it seems to be an issue with windows but I'm not sure which team owns this specific codebase. Could you please triage this accordingly?

ChrsMark avatar Jul 26 '21 08:07 ChrsMark

This is caused by a nil address (source of datagram) returned by the read routine UDPConn.ReadFrom.

After discussing this with @kvch, who was the last person to refactor this code, she pointed me to this comment by @ph, which indicates that this is the known behavior under Windows for truncated datagrams.

Other than the code that caused this crash: https://github.com/elastic/beats/blob/877d8bcd176b2f5d4efd2a81846a481b94798b49/filebeat/input/udp/input.go#L77

there's other places that don't check for a nil addr from UDP, for example in the netflow input: https://github.com/elastic/beats/blob/877d8bcd176b2f5d4efd2a81846a481b94798b49/x-pack/filebeat/input/netflow/decoder/v9/v9.go#L74-L83

We could try to fix all of them, or for simplicity just propagate a zero address (0.0.0.0:0) when we receive a nil, but I don't like the idea of processing UDP packets for which we can't report the source.

@kvch @ph @andrewkroh WDYT about dropping these packets entirely, while still logging an error so that the user can adjust the read buffer?

adriansr avatar Jul 26 '21 13:07 adriansr

This seems like a bug in Go (at least for the handling of WSAEMSGSIZE) where it should return the raw sock addr and an error. https://github.com/golang/go/blob/849b7911293c3cb11d76ff2778ed560100f987d1/src/internal/poll/fd_windows.go#L590

For the syslog use case as a user I would prefer to have the partial data that was received along with an error in the event. I'd prefer not to see 0.0.0.0 in the event since it's not the source address, but don't mind it if it is there to keep the code simple since it's obviously invalid.

andrewkroh avatar Jul 26 '21 14:07 andrewkroh

Pinging @elastic/agent (Team:Agent)

elasticmachine avatar Aug 20 '21 13:08 elasticmachine

Hello,

We are seeing this in Filebeat 7.15.0 using the Fortinet module with syslog. Windows Server 2019. Any update on this issue?

Doserdog avatar Oct 08 '21 13:10 Doserdog

Hello,

We are seeing this in Filebeat 7.15.0 using the Fortinet module with syslog. Windows Server 2019. Any update on this issue?

We've found a workaround by using TCP input and specifying framing: rfc6587. This is for receiving Fortinet reliable syslog v6.4 events. But hope it helps!

Doserdog avatar Oct 08 '21 19:10 Doserdog

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

botelastic[bot] avatar Oct 08 '22 20:10 botelastic[bot]