openobserve icon indicating copy to clipboard operation
openobserve copied to clipboard

When openobserve as syslog server, tcp connection was closed repeatly almost every 60s

Open kmephistoh opened this issue 1 year ago β€’ 11 comments

Which OpenObserve functionalities are the source of the bug?

ingestion

Is this a regression?

Yes

Description

When openobserve as syslog server, tcp connection was closed repeatly almost every 60s, client print these annoying logs. Even so, logs can still be collected.

Jul 26 15:34:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:34:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:34:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:35:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:35:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:35:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:36:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:36:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:36:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:36:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:36:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:36:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:37:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:37:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:37:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:37:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:37:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:37:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]

When i select rsyslog as a syslog server on the same server 10.11.10.24:514, everything is ok(no annoying logs).

image

tcpdump show me, the connection was RST, so i think the reason may was the server side. no firewall or load balancer.

Please provide a link to a minimal reproduction of the bug

No response

Please provide the exception or error you saw

No response

Please provide the version you discovered this bug in (check about page for version information)

openobserve v0.10.8-rc5 
os: Ubuntu 22.04.4 LTS

Anything else?

No response

kmephistoh avatar Jul 26 '24 07:07 kmephistoh

I'm pretty sure this has to do with TCP keepalive, are you sending syslog during those 60seconds?

gaby avatar Jul 26 '24 12:07 gaby

I'm pretty sure this has to do with TCP keepalive, are you sending syslog during those 60seconds?

Sorry to reply too late, sometime it has log during this 60, The result is still the same。

image

As can be seen from the above test, the same log is printed each time.

kmephistoh avatar Jul 29 '24 02:07 kmephistoh

I am getting similar issuse when logging to OpenObserver as a syslog server running under Unraid 6.12.11, except I'm getting more frequent errors:

Aug 22 07:52:47 unraid01 rsyslogd: omfwd: remote server at 192.168.3.172:1514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2102.0 try https://www.rsyslog.com/e/2027 ] Aug 22 07:52:47 unraid01 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2102.0 try https://www.rsyslog.com/e/2007 ] Aug 22 07:52:47 unraid01 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2102.0 try https://www.rsyslog.com/e/2359 ] Aug 22 07:52:49 unraid01 kernel: eth0: renamed from veth42b5b63 Aug 22 07:52:49 unraid01 rsyslogd: omfwd: remote server at 192.168.3.172:1514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2102.0 try https://www.rsyslog.com/e/2027 ] Aug 22 07:52:49 unraid01 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2102.0 try https://www.rsyslog.com/e/2007 ] Aug 22 07:52:49 unraid01 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2102.0 try https://www.rsyslog.com/e/2359 ] Aug 22 07:52:49 unraid01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethb277e8f: link becomes ready Aug 22 07:52:49 unraid01 kernel: docker0: port 2(vethb277e8f) entered blocking state Aug 22 07:52:49 unraid01 kernel: docker0: port 2(vethb277e8f) entered forwarding state Aug 22 07:53:12 unraid01 kernel: veth42b5b63: renamed from eth0 Aug 22 07:53:12 unraid01 rsyslogd: omfwd: remote server at 192.168.3.172:1514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2102.0 try https://www.rsyslog.com/e/2027 ] Aug 22 07:53:12 unraid01 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2102.0 try https://www.rsyslog.com/e/2007 ] Aug 22 07:53:12 unraid01 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2102.0 try https://www.rsyslog.com/e/2359 ] Aug 22 07:53:12 unraid01 kernel: docker0: port 2(vethb277e8f) entered disabled state Aug 22 07:53:12 unraid01 kernel: docker0: port 2(vethb277e8f) entered disabled state Aug 22 07:53:12 unraid01 kernel: device vethb277e8f left promiscuous mode Aug 22 07:53:12 unraid01 kernel: docker0: port 2(vethb277e8f) entered disabled state Aug 22 07:53:12 unraid01 rsyslogd: omfwd: remote server at 192.168.3.172:1514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2102.0 try https://www.rsyslog.com/e/2027 ] Aug 22 07:53:12 unraid01 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2102.0 try https://www.rsyslog.com/e/2007 ] Aug 22 07:53:12 unraid01 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2102.0 try https://www.rsyslog.com/e/2359 ]

doychi avatar Aug 21 '24 22:08 doychi

@kmephistoh @doychi do you mind we give you a dev version for help testing the fix?

hengfeiyang avatar Aug 27 '24 05:08 hengfeiyang

Hi,

I'd love to, but I'll have to work out how to get it installed under Unraid.

Thanks, Justin

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Hengfei Yang @.> Sent: Tuesday, August 27, 2024 3:22:45 PM To: openobserve/openobserve @.> Cc: Doychi @.>; Mention @.> Subject: Re: [openobserve/openobserve] When openobserve as syslog server, tcp connection was closed repeatly almost every 60s (Issue #4067)

@kmephistohhttps://github.com/kmephistoh @doychihttps://github.com/doychi do you mind we give you a dev version for help testing the fix?

β€” Reply to this email directly, view it on GitHubhttps://github.com/openobserve/openobserve/issues/4067#issuecomment-2311596141, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACT6YFV3FDZBG5EPRSFNG6TZTQEKLAVCNFSM6AAAAABLQAI3BCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRGU4TMMJUGE. You are receiving this because you were mentioned.Message ID: @.***>

doychi avatar Aug 27 '24 05:08 doychi

@doychi we only have docker image for dev version. maybe let's wait for some days, we will release a binary version then i will back to you.

hengfeiyang avatar Aug 27 '24 10:08 hengfeiyang

@kmephistoh @doychi do you mind we give you a dev version for help testing the fix? ok, i'm waiting here.

kmephistoh avatar Aug 29 '24 11:08 kmephistoh

@kmephistoh can you try this version:

public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-e7e01c5

hengfeiyang avatar Aug 29 '24 11:08 hengfeiyang

public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-e7e01c5 image Can't download

kmephistoh avatar Aug 30 '24 02:08 kmephistoh

@kmephistoh this is docker image, you can access it by browser, you can use

docker pull public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-684dd39

or directly run it:

docker run -tdi --name o2-test -e ZO_ROOT_USER_EMAIL="[email protected]" -e ZO_ROOT_USER_PASSWORD="Complexpass#123" -e ZO_DATA_DIR="/data" -p 5080:5080 -p 5514:5514 public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-684dd39

Please try this version, we added some debug logs.

hengfeiyang avatar Aug 30 '24 10:08 hengfeiyang

I have installed the image and will keep an eye on it today and let you know how it goes.

doychi avatar Aug 30 '24 22:08 doychi

The new image seems to have resolved the issue for me.

doychi avatar Sep 01 '24 02:09 doychi

Can you let me know when this is likely to make it the latest stable release, so I can switch back to the stable line?

doychi avatar Sep 01 '24 08:09 doychi

sure, will make release next week

hengfeiyang avatar Sep 01 '24 09:09 hengfeiyang

@kmephistoh this is docker image, you can access it by browser, you can use

docker pull public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-684dd39

or directly run it:

docker run -tdi --name o2-test -e ZO_ROOT_USER_EMAIL="[email protected]" -e ZO_ROOT_USER_PASSWORD="Complexpass#123" -e ZO_DATA_DIR="/data" -p 5080:5080 -p 5514:5514 public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-684dd39

Please try this version, we added some debug logs.

I try this version, the issue's problem was resolved. But other normal log are loss sometime. Thank you for your help

kmephistoh avatar Sep 02 '24 10:09 kmephistoh

We released v0.11.0, both of you can test it now.

hengfeiyang avatar Sep 02 '24 10:09 hengfeiyang

We released v0.11.0, both of you can test it now. fc2df0a3-003d-4a92-b86c-81b4436cb7ff

The problem was resolved, thanks for your help.

kmephistoh avatar Sep 04 '24 08:09 kmephistoh