No retry on unwritten events to TCP-output
Setup
1 running openwec daemon that receiving events from Windows machine. Config:
[[collectors]]
listen_address = "0.0.0.0"
listen_port = 5985
[collectors.authentication]
type = "Tls"
ca_certificate = "/mnt/persist/openwec-certs/ca-cert.pem"
server_certificate = "/mnt/persist/openwec-certs/server-cert.pem"
server_private_key = "/mnt/persist/openwec-certs/server-key.pem"
[logging]
verbosity = "info"
server_logs = "stdout"
access_logs = "stdout"
[database]
type = "SQLite"
path = "/var/lib/openwec/db/openwec.sqlite"
[monitoring]
listen_address = "0.0.0.0"
listen_port = 9090
1 active subscription:
Subscription test-tcp-json
UUID: b199461f-1394-4e90-b03a-8b14d7e2d93d
Internal version: 06bc69ad-ae36-480f-a75c-7f30e4d8524a
Public version: d52e481e-723c-bfe8-d52e-481e723cbfe8
Revision: Not configured
URI: Not configured
Heartbeat interval: 3600s
Connection retry count: 5
Connection retry interval: 60s
Max time without heartbeat/events: 5s
Max events in a batch: Not configured
Max envelope size: 512000 bytes
Read existing events: false
Content format: Raw
Ignore channel error: true
Locale: Not configured
Data Locale: Not configured
Principal filter: Not configured
Outputs:
- 0: Enabled: true, Format: json, Driver: Tcp(TcpConfiguration { host: "localhost", port: 5000, tls_enabled: false, tls_certificate_authorities: [], tls_certificate: None, tls_key: None })
Enabled: true
Event filter query:
<QueryList><Query Id="0" Path="System"><Select Path="System">*</Select></Query></QueryList>
Receiving events on localhost with socat:
socat TCP-LISTEN:5000,reuseaddr,fork -
Problem
Loosing events that were sent while socat server is down.
How to reproduce
- Run socat
- Receive some events
- Ctlr+C socat
- Wait for Broken pipe message in openwec logs
WARN server::logic - Failed to process output and send event: Failed to write event to output format: Json, driver: Tcp(TcpConfiguration { host: "localhost", port: 5000, tls_enabled: false, tls_certificate_authorities: [], tls_certificate: None, tls_key: None })
Caused by:
Failed to write in TCP connection (localhost:5000): Broken pipe (os error 32)
- Run socat
- Receive some events, confirm gap
$ socat TCP-LISTEN:5000,reuseaddr,fork - | jq | grep EventRecordID
"EventRecordID": 40886,
"EventRecordID": 40887,
"EventRecordID": 40888,
"EventRecordID": 40889,
"EventRecordID": 40890,
"EventRecordID": 40891,
"EventRecordID": 40892,
"EventRecordID": 40893,
^C
$ socat TCP-LISTEN:5000,reuseaddr,fork - | jq | grep EventRecordID
"EventRecordID": 40902,
"EventRecordID": 40903,
"EventRecordID": 40904,
"EventRecordID": 40905,
"EventRecordID": 40906,
Hi! Thanks for bringing up the issue.
When an output fails, OpenWEC returns a status code 500 to clients and discards the newly received bookmark. I believe each client then drops the subscription and tries to create a new one. However, we know (#291) that there is an issue with the TLS authentication method that makes clients ignore the bookmark sent by the server during the subscription creation, resulting in events loss.
To confirm this theory, could you replay the scenario with the openwec log level set to trace?
Depending on how socat handles SIGINT, it is also possible to lose a few messages (the content of the kernel buffer) on plain TCP due to the lack of application-level acknowledgment.
To confirm this theory, could you replay the scenario with the openwec log level set to
trace?
Hi!
$ socat TCP-LISTEN:6789,reuseaddr,fork - | jq | grep EventRecordID
"EventRecordID": 42333,
"EventRecordID": 42334,
"EventRecordID": 42335,
"EventRecordID": 42336,
"EventRecordID": 42337,
^C
$ socat TCP-LISTEN:6789,reuseaddr,fork - | jq | grep EventRecordID
"EventRecordID": 42345,
"EventRecordID": 42346,
"EventRecordID": 42347,
"EventRecordID": 42348,
"EventRecordID": 42349,
"EventRecordID": 42350,
"EventRecordID": 42351,
"EventRecordID": 42352,
The trace confirms that openwec correctly received the missing events, so my previous assumption was incorrect.
- 2025-09-26T14:13:46.220951774+00:00: events 42333 -> 42337 have been received
- 2025-09-26T14:15:22.956897092+00:00: events 42338 -> 42344 have been received
- 2025-09-26T14:16:24.184468448+00:00: event 42345 has been received, but the output fails (
Failed to write in TCP connection (localhost:6789): Broken pipe (os error 32)), so openwec sends an error to the client - 2025-09-26T14:17:24.230420210+00:00: event 42345 has been received (again), but the output fails (
Failed to connect to localhost:6789: Failed to establish TCP connection), so openwec sends an error back to the client - 2025-09-26T14:18:24.243395486+00:00: same
- 2025-09-26T14:19:24.249624981+00:00: event 42345 has been received (again), and the output worked as expected
- 2025-09-26T14:19:24.262068876+00:00: events 42346 -> 42355 have been received
I guess you stopped socat between 14:13 and 14:15.
The missing events have been dropped at some point. OpenWEC writes directly to the TcpStream, which is a simple wrapper over write() without any buffering (so no need to call flush()). So the events have most likely reached the sender's kernel buffer. After that, everything depends on the kernel TCP stack. Currently, openwec has no way to guarantee that the events placed in the kernel sending buffer have effectively been received by the application on the other side. I don't know if this is possible without an application-level acknowledgment system, which is beyond the scope of the TCP driver.
If you need more guarantees (which you should), I strongly advise using a more robust queuing system such as Kafka.