loki icon indicating copy to clipboard operation
loki copied to clipboard

Promtail: syslog job over UDP is not working

Open xtavras opened this issue 2 years ago • 13 comments

Describe the bug Promtail will not deliver "Syslog UDP" logs to Loki, but "Syslog TCP" logs works fine.

To Reproduce Steps to reproduce the behavior:

  1. Enable "udp" as "listen_protocol" option for syslog config, e.g.
    syslog:
      idle_timeout: 60s
      listen_protocol: udp
      label_structured_data: true
      labels:
        job: syslog
      listen_address: 0.0.0.0:514

Try to send some logs using UDP, I've used this example or real hardware switches which only support UDP syslog https://gist.github.com/xtavras/4a01f7d1f94237a4abcdfb02074453c1

I can see in tcpdump that logs are coming to promtail machine, but nothing in Loki. I've tried to run promtail with debug option but don't see any additional errors. 2. Enable "tcp" as "listen_protocol" (it's default), e.g.

    syslog:
      idle_timeout: 60s
      listen_protocol: tcp
      label_structured_data: true
      labels:
        job: syslog
      listen_address: 0.0.0.0:514

Send logs using tcp, I've used this example https://gist.github.com/xtavras/be13760713e2a9ee1a8bdae2ed6d2565

Logs are visible in tcpdump and are coming and visible in Loki.

Expected behavior Logs sent to UDP syslogs should be delivered to Loki same way as TCP syslogs

Environment:

  • Loki: 2.5.0
  • Promtail: 2.6.1

Screenshots, Promtail config, or terminal output Promtail config

server:
  http_listen_port: 9080


positions:
  filename: /var/lib/promtail/positions.yml


clients:
  - basic_auth:
      password: xxxxxxxxxxxxxxxx
      username: loki
    tenant_id: 1
    url: https://loki.example.com/api/prom/push


scrape_configs:
  - file_sd_configs:
    - files:
      - /etc/promtail/file_sd/*.yml
      - /etc/promtail/file_sd/*.yaml
      - /etc/promtail/file_sd/*.json
    job_name: file_sd

  - job_name: syslog-logs
    relabel_configs:
    - source_labels:
      - __syslog_message_hostname
      target_label: host
    syslog:
      idle_timeout: 60s
      listen_protocol: udp
      label_structured_data: true
      labels:
        job: syslog
      listen_address: 0.0.0.0:514

xtavras avatar Jul 26 '22 13:07 xtavras

@xtavras How have you installed loki and promtail? One thing I noticed is that if I install using grafana/loki-stack chart the service will be created as tcp despite the listen_protocol: udp

jsalatiel avatar Jul 30 '22 20:07 jsalatiel

@jsalatiel I'm using this Helm chart for Loki https://github.com/grafana/helm-charts/tree/main/charts/loki-simple-scalable

Promtail is just official binary deployed with this ansible playbook https://github.com/patrickjahns/ansible-role-promtail which just use github releases, I've tried to build binary from "main" and have same problem.

I can see that Promtail is listeing on udp port using netstat

netmon:~# netstat -ulpen |grep 514
udp6       0      0 :::514                  :::*                                0          30732786   11061/promtail

xtavras avatar Aug 01 '22 09:08 xtavras

Can you check if the kubernetes service has been created with TCP or UDP?

jsalatiel avatar Aug 01 '22 12:08 jsalatiel

@jsalatiel what service do you mean? Isn't Promtail communicate with Loki just with HTTP calls, so they are always TCP?

xtavras avatar Aug 01 '22 12:08 xtavras

AFAIK when you enable syslog on promtail it creates a service so that external services can send syslog messages to promtail. Check kubectl get svc -o wide on your cluster in the namespace you have installed promtail.

jsalatiel avatar Aug 01 '22 12:08 jsalatiel

Sorry if I was not clear, promtail is running on VM just as binary, so no Docker or Kubernetes, only Loki itself is running on Kubernetes. And I see that port is udp locally on VM, if change it to tcp in config, it will be tcp in netstat output as well, so it looks good to me tbh.

xtavras avatar Aug 01 '22 12:08 xtavras

If there's nothing listening on UDP thats the reason you can not send udp messages to promtail =)

jsalatiel avatar Aug 01 '22 12:08 jsalatiel

But it is listening on UDP, I already posted my netstat ouput. Promtail send data to Loki always using HTTP/HTTPS so not sure why UDP is expected on Loki side, correct me if I'm wrong.

xtavras avatar Aug 01 '22 12:08 xtavras

@xtavras did you check syslog config , if there is udp / or tcp destination ?

saibug avatar Aug 04 '22 15:08 saibug

@saibug yes, our switches (Huawei) only support UDP on port 514, like I've said I see it in tcpdump in cleartext on promtail machine.

xtavras avatar Aug 05 '22 09:08 xtavras

Here is tcpdump example, as you see protocol is UDP: first is test script, second is real log from switch

netmon:~# tcpdump -i any port 514 -vvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes


11:17:15.427290 IP (tos 0x0, ttl 63, id 22762, offset 0, flags [DF], proto UDP (17), length 112)
    10.1.176.55.37603 > 10.1.1.68.syslog: [udp sum ok] SYSLOG, length: 84
	Facility user (1), Severity warning (4)
	Msg: 1 2022-08-05T11:17:15.423205+02:00 t14 promtail 291547 1 - \0xef\0xbb\0xbfthis is a UDP test
	0x0000:  3c31 323e 3120 3230 3232 2d30 382d 3035
	0x0010:  5431 313a 3137 3a31 352e 3432 3332 3035
	0x0020:  2b30 323a 3030 2074 3134 2070 726f 6d74
	0x0030:  6169 6c20 3239 3135 3437 2031 202d 20ef
	0x0040:  bbbf 7468 6973 2069 7320 6120 5544 5020
	0x0050:  7465 7374
11:21:02.296315 IP (tos 0x0, ttl 254, id 51272, offset 0, flags [none], proto UDP (17), length 205)
    10.1.250.168.38514 > 10.1.1.68.syslog: [udp sum ok] SYSLOG, length: 177
	Facility local7 (23), Severity info (6)
	Msg: Aug  5 2022 09:27:44 ds28-293L.example.com %%01INFO/6/SUPPRESS_LOG(l)[43605]:Last message repeated 5 times.(InfoID=1092489232, ModuleName=MSTP, InfoAlias=RECEIVE_MSTITC)
	0x0000:  3c31 3930 3e41 7567 2020 3520 3230 3232
	0x0010:  2030 393a 3237 3a34 3420 6473 3238 2d32
	0x0020:  3933 4c2e 6265 726c 696e 2e6e 692e 7465
	0x0030:  616d 2025 2530 3149 4e46 4f2f 362f 5355
	0x0040:  5050 5245 5353 5f4c 4f47 286c 295b 3433

xtavras avatar Aug 05 '22 09:08 xtavras

Same thing for me. I have Unifi EdgeSwitch which supports only UDP.

Promtail configuration:

scrape_configs:
  # ...
  - job_name: syslog
    relabel_configs:
      - source_labels:
          - __syslog_message_hostname
        target_label: host
    syslog:
      idle_timeout: 60s
      label_structured_data: true
      labels:
        cluster: lab
        job: syslog
      listen_address: 0.0.0.0:514
      listen_protocol: udp

Started via systemd (on NixOS) on 514/udp. Attaching traffic dump

514.cap.zip

corpix avatar Aug 09 '22 03:08 corpix

In my case packets are handled by nontransparent parser. I've made a minimal server to debug this https://gist.github.com/corpix/d76a90678d24e412eb5f9347dc1b58c1

@xtavras example could be used to send UDP packets to this server https://gist.github.com/xtavras/4a01f7d1f94237a4abcdfb02074453c1

UDP is not connection oriented, so there is no EOF. Parser relies on EOF or delimiter (which is LF by default). In my case 1 packet contains exactly 1 syslog message, so there is no delimiter and ReadBytes never return.

corpix avatar Aug 09 '22 06:08 corpix

~~Hello!~~ ~~I just hit this issue with messages from my pfSense.~~ ~~Just would like to add that it appears it can send multiple messages in same udp packet, none with the delimiter.~~ ~~My dirty hack to add a \n in the end of every received packet failed.~~

~~I'll just put something in front for now.~~

Sorry. Something else was going on here, not sure what. Each message is one packet.

Daedaluz avatar Aug 19 '22 11:08 Daedaluz

I've been testing UDP using rsyslog and the following configuration:

module(load="omprog")
module(load="mmutf8fix")
action(type="mmutf8fix" replacementChar="?")
*.* action(type="omfwd" protocol="udp" target="<host>" port="514" Template="RSYSLOG_SyslogProtocol23Format")

With this configuration Promtail receives the logs and pushed them to Loki.

Unfortunately I don't have a hardware device where I could test it.

I've also tested your Python script, and by adding a newline character at the end of the log message, it worked:

#!/usr/bin/env python3

import socket
import logging
from random import random
from time import sleep
from rfc5424logging import Rfc5424SysLogHandler, transport

logger = logging.getLogger("promtail")
logger.setLevel(logging.DEBUG)

rfc5424Handler = Rfc5424SysLogHandler(address=("localhost", 514), hostname="localhost", appname="promtailudp.py")
rfc5424Handler.setLevel(logging.DEBUG)
logger.addHandler(rfc5424Handler)

for x in range(100):
    logger.warning("this is a UDP test\n", extra={"msgid":1})
    sleep(random())

Seems like this is required as delimiter.

chaudum avatar Aug 26 '22 08:08 chaudum

@chaudum thank you for looking into it, I can confirm that your script works indeed, removing newline breaks it. Unfortunately our switches do not add new line at the end (and looking at comments looks like some other vendors don't do it too). What do you think, would it be possible as workaround append newline to such log messages in promtail? I guess as another workaround we can use rsyslog as broker between promtail and switches, but I'm just wondering why TCP works fine even without newline and our past logging solutions like rsyslog or graylog worked with same hardware as well.

xtavras avatar Aug 26 '22 14:08 xtavras

@chaudum thank you for looking into it, I can confirm that your script works indeed, removing newline breaks it. Unfortunately our switches do not add new line at the end (and looking at comments looks like some other vendors don't do it too). What do you think, would it be possible as workaround append newline to such log messages in promtail?

@grafana/loki-team What do you think?

I guess as another workaround we can use rsyslog as broker between promtail and switches, but I'm just wondering why TCP works fine even without newline and our past logging solutions like rsyslog or graylog worked with same hardware as well.

Tbh, I have no idea why it works over TCP but not over UDP, since we use the same code path for parsing messages. I would need to investigate the syslog parser library we use. That would take a lot of time I guess.

chaudum avatar Aug 30 '22 18:08 chaudum

@chaudum I took a quick look. Seems you've added UDP support in https://github.com/grafana/loki/pull/5790. I'm wondering. Are we selecting the correct parser or does the parse simply not support messages without newlines?

jeschkies avatar Feb 02 '23 11:02 jeschkies

I noted similar behaviour with promtail:2.8.2. UDP logs from Synology DS920+ NAS do not appear in queries to Loki after sending but flush to Loki on restart of Promtail container. TCP works as expected.

Pengozoid avatar May 04 '23 17:05 Pengozoid

I dug a little deeper and found that the reader we send to the parser is coded to read until it sees the expected delimiter. github.com/leodido/ragel-machinery/parser. In promtail's case it is looking for a Line Feed so the reader will hang looking for it on any message not ending in a line feed.

erikbaranowski avatar May 22 '23 14:05 erikbaranowski

I noted similar behaviour with promtail:2.8.2. UDP logs from Synology DS920+ NAS do not appear in queries to Loki after sending but flush to Loki on restart of Promtail container. TCP works as expected.

Same here. On Synology NAS you can choose whether to send syslogs in TCP or UDP format. TCP works without any issues, UDP not. Funny thing is that syslog is implemented in a primitive way on most devices (e.g. switches), so it is definded as UDP by default and cannot be changed as on advanced devices (e.g. Synology NAS). That said, Promtail is mostly useless to me the way it is right now for syslog purposes, unfortunately. It would be a big step forward if UDP could be supported in the near future.

Grandma-Betty avatar Jun 01 '23 08:06 Grandma-Betty

Put in a PR to fix this here and in grafana agent - very simple workaround of making a bytes reader from the datagram so the parser gets the EOF

joshuapare avatar Sep 26 '23 02:09 joshuapare

Hi, (first interaction on Github so apologises for any missing information I may provide)

I think I may be experiencing this exact issue in our network. We have a few hundreds switches and routers which syslog back to two central archive logging servers using UDP 514, Debian bare-metal based running syslog-ng. After forwarding all syslog to Promtail via UDP, nothing is caught. Promtail does work via scraping the local logs sorted via syslog but this is not ideal for labelling.

We run a Grafana instance on one of these boxes and I would love to get Loki logs working with Promtail.

I will be keeping a close eye on this issue as I'm eager to find a solution, and reluctant to move our boxes to rsyslog.

lfmullen avatar Oct 27 '23 13:10 lfmullen

@joshuapare was your grafana agent PR merged? if so we can likely merge a similar fix here, we're just trying to avoid too much divergence since the agent team forked the promtail code and soon all logs collection work will be done within that repo.

cstyan avatar Nov 16 '23 02:11 cstyan

@joshuapare was your grafana agent PR merged? if so we can likely merge a similar fix here, we're just trying to avoid too much divergence since the agent team forked the promtail code and soon all logs collection work will be done within that repo.

Yep it was, and the matching PR for this repo is here as well: https://github.com/grafana/loki/pull/10708

joshuapare avatar Nov 16 '23 02:11 joshuapare

@cstyan : Referring to your comment of Promtail being in a "feature complete" status now (https://github.com/grafana/loki/pull/10256#issuecomment-2050758320), is there any chance for this be fixed as well? It seems that there have been multiple almost PRs for Promtail and for the Grafana Agent, but receiving syslog events via UDP is still not working. A lot of (hardware and software) appliances are still unable to use UDP as a transport protocol for syslog, fixing this would be greatly appreciated by a lot of people (including me and my colleagues)!

MarcBrendel avatar Apr 12 '24 07:04 MarcBrendel