openwec icon indicating copy to clipboard operation
openwec copied to clipboard

Loosing events on OpenWEC restart

Open semyonkozlov opened this issue 4 months ago • 14 comments

My setup: 1 windows machine (events source), 1 linux machine (events collector - OpenWEC is running here) Configured successfully and receiving events, but loosing events that were generated during openwec shutdown. Is it expected behavior?

Steps to reproduce:

  1. Creating subscription with filesystem output:
# /etc/openwec.d/03-filesystem.toml
uuid = "e8f232e7-78fa-4fbd-a5af-b8714316b840"
name = "filesystem"

query = """
<QueryList>
    <Query Id="0" Path="Application">
        <Select Path="System">*</Select>
    </Query>
</QueryList>
"""

[options]
read_existing_events = false
max_time = 5

# Subscription outputs
[[outputs]]
driver = "Files"
format = "Json"
config = { path = "/home/openwec/forwarded.log" }

OpenWEC conf:

[[collectors]]
hostname = "wec.server.local"
listen_address = "0.0.0.0"
listen_port = 5985

[collectors.authentication]
type = "Tls"
ca_certificate = "/etc/certs/ca-cert.pem"
server_certificate = "/etc/certs/server-cert.pem"
server_private_key = "/etc/certs/server-key.pem"

[database]
type = "SQLite"
path = "/var/lib/openwec/db/db.sqlite"

[server]
keytab = "/etc/openwec.keytab"

[monitoring]
listen_address = "0.0.0.0"
listen_port = 9090

[logging]
verbosity = "info"
access_logs = "stdout"
  1. Push some events
  2. Verify they are in the file - all good
  3. Shutdown openwecd
  4. Push 50 events from windows machine
for ($i = 0; $i -lt 50; $i++) {
    .\push-event.ps1 $i
}
  1. Launch ./openwecd
  2. Push some new events
  3. The bookmark has moved but no events in the file and there is Direction="backward" (what does it mean?)
$ ./openwec bookmarks show filesystem
winclient.domain.corp:<BookmarkList Direction="backward"><Bookmark Channel="System" RecordId="28251" IsCurrent="true"/></BookmarkList>
  1. Verify there is a gap in the events:
$ cat forwarded.log | jq | grep EventRecordID
  "EventRecordID": 28153,
  "EventRecordID": 28154,
  "EventRecordID": 28155,
  "EventRecordID": 28156,
  "EventRecordID": 28157,
  "EventRecordID": 28158,
  "EventRecordID": 28159,
  "EventRecordID": 28160,
  "EventRecordID": 28161,
  "EventRecordID": 28162,
  "EventRecordID": 28163,
  "EventRecordID": 28164,
  "EventRecordID": 28165,
  "EventRecordID": 28166,
  "EventRecordID": 28167,
  "EventRecordID": 28168,
  "EventRecordID": 28169,
  "EventRecordID": 28170,
  "EventRecordID": 28171,
  "EventRecordID": 28172,
  "EventRecordID": 28173,
  "EventRecordID": 28174,
  "EventRecordID": 28175,
  "EventRecordID": 28176,
  "EventRecordID": 28177,
  "EventRecordID": 28178,
  "EventRecordID": 28179,
  "EventRecordID": 28180,
  "EventRecordID": 28181,
  "EventRecordID": 28251,  <-------------
  "EventRecordID": 28252,
  "EventRecordID": 28253,
  "EventRecordID": 28254,

semyonkozlov avatar Aug 19 '25 11:08 semyonkozlov

Got the same result if read_existing_events = true

$ tail -f forwarded_existing.log | jq | grep EventRecordID
    "EventRecordID": 28357,
    "EventRecordID": 28358,
    "EventRecordID": 28359,
    "EventRecordID": 28360,
    "EventRecordID": 28361,
    "EventRecordID": 28362,
    "EventRecordID": 28363,
    "EventRecordID": 28364,
    "EventRecordID": 28449,  <----------------
    "EventRecordID": 28450,
    "EventRecordID": 28451,
    "EventRecordID": 28452,
    "EventRecordID": 28453,

semyonkozlov avatar Aug 19 '25 13:08 semyonkozlov

From the protocol.md

w:Bookmark: ensures we do not miss any event. The sender sends an updated bookmark each time it sends events. The collector needs to save the bookmark (kind of "cookie"). Then if something goes wrong (a network issue for example) the collector can tell the sender which is the last bookmark it received. The sender will then know which events were missed and need to be sent (DSP0226 10.2.6).

semyonkozlov avatar Aug 19 '25 14:08 semyonkozlov

Hi!

Could you check if the missing events exist locally on the Windows machine using Event Viewer?

Configured successfully and receiving events, but loosing events that were generated during openwec shutdown. Is it expected behavior?

No, you are not supposed to miss any events.

vruello avatar Aug 19 '25 15:08 vruello

Hi Yep, all events exist on Windows machine. Attaching screenshot showing missing event 28182 from the first example.

Image

semyonkozlov avatar Aug 19 '25 16:08 semyonkozlov

Just to note: after I shutdown openwec (Ctrl+C) and pushed events, I waited for some time (5-10 minutes) before restart. So it wasn't instant restart.

semyonkozlov avatar Aug 19 '25 16:08 semyonkozlov

Interesting! I'll try to reproduce this on my own.

In the meantime, could you try to reproduce the issue with the openwec log level set to trace and share openwec logs? You can tell openwec to write openwec logs to a file :

# openwec.conf.toml

[logging]
# Set server logging verbosity
verbosity = "trace"
server_logs = "<LOG FILE PATH>"

[!caution] trace mode is, as expected, very verbose. In this mode, openwec dumps all messages exchanged between openwec and the windows machine in clear text. Make sure there is nothing private in the log file before sharing it.

We should be able to understand what is happening with the openwec log file.

Just to note: after I shutdown openwec (Ctrl+C) and pushed events, I waited for some time (5-10 minutes) before restart. So it wasn't instant restart.

That should not change anything.

vruello avatar Aug 19 '25 16:08 vruello

Sure! Attaching two logs: before and after restart.

openwec-trace.log openwec-trace-after-restart.log

$ tail -f forwarded_existing.log | jq | grep EventRecordID
    "EventRecordID": 28817,
    "EventRecordID": 28818,
    "EventRecordID": 28819,
    "EventRecordID": 28820,
    "EventRecordID": 28821,
    "EventRecordID": 28822,
    "EventRecordID": 28898,  <--------------
    "EventRecordID": 28899,

semyonkozlov avatar Aug 20 '25 12:08 semyonkozlov

Thanks!

So, according to openwec-trace.log:

  • 11:32:55Z: The first EnumerateResponse sent by openwec contains the bookmark: <BookmarkList><Bookmark Channel=\"System\" RecordId=\"28814\" IsCurrent=\"true\"/></BookmarkList>.
  • 11:32:58Z: 2 events are received (28815 and 28816). Bookmark is set to <BookmarkList><Bookmark Channel=\"System\" RecordId=\"28816\" IsCurrent=\"true\"/></BookmarkList>.
  • 11:32:58Z: 3 events are received (28817, 27718, 28819). Bookmark is set to <BookmarkList><Bookmark Channel=\"System\" RecordId=\"28819\" IsCurrent=\"true\"/></BookmarkList>
  • 11:33:20Z: 3 events are received (28820, 28821, 28822). Bookmark is set to <BookmarkList><Bookmark Channel=\"System\" RecordId=\"28822\" IsCurrent=\"true\"/></BookmarkList>
  • 11:34:45Z: The last EnumerateResponse sent by openwec contains tbe bookmark: <BookmarkList><Bookmark Channel=\"System\" RecordId=\"28822\" IsCurrent=\"true\"/></BookmarkList>

For now, everything looks nice. In openwec-trace-after-restart.log:

  • 12:25:32Z: the first EnumerateResponse sent by openwec contains the bookmark: <Bookmark Channel="System" RecordId="28822" IsCurrent="true"/></BookmarkList></w:Bookmark>
  • 12:25:32Z : 1 event is received, from provider Microsoft-Windows-EventForwarder with EventID 111 and no EventRecordID. It has been created at 12:25:32.531Z. Bookmark is set to <BookmarkList Direction=\"backward\"><Bookmark Channel=\"System\" RecordId=\"28897\" IsCurrent=\"true\"/></BookmarkList>.
  • 12:25:54Z: an EnumerateResponse is sent by openwec. It contains the bookmark: <BookmarkList Direction="backward"><Bookmark Channel="System" RecordId="28897" IsCurrent="true"/></BookmarkList>
  • 12:26:16Z: another EnumerateResponse. Same bookmark.
  • 12:26:27Z: another EnumerateResponse. Same bookmark.
  • 12:26:49Z: another EnumerateResponse. Same bookmark.
  • 12:27:00Z: another EnumerateResponse. Same bookmark.
  • 12:27:11Z: another EnumerateResponse. Same bookmark.
  • 12:27:22Z: another EnumerateResponse. Same bookmark.
  • 12:27:33Z: another EnumerateResponse. Same bookmark.
  • 12:27:42Z: 2 events received (28898 and 28899). Bookmark is set to <BookmarkList><Bookmark Channel=\"System\" RecordId=\"28899\" IsCurrent=\"true\"/></BookmarkList>. Event 28898 has been created at 12:27:37.3561765Z'.
  • ...

There is something very interesting here. The bookmark sent with the 111 event is "wrong" and its format is slightly different than usual (the Direction="backward" that you noticed before). Unfortunately, I am not aware of any specifications for the bookmark format so I don't know what this means. In OpenWEC, we follow DSP0226 and treat the bookmark as an opaque text blob, without attempting to parse it.

I wonder if this special bookmark should be interpreted as "the last existing event record id in the corresponding channel" rather than the usual "last event sent". It could make sense for the forwarder to send the last existing record id in the channel so that the collector can evaluate its lag?

I'm thinking of two possible workarounds:

  • ignore the bookmark when the message only contains the special event 111
  • or ignore the bookmark if it contains the string Direction="backward"

Of course this requires further thought and testing.

I have a few questions for you, @semyonkozlov:

  • could you check in Event Viewer if you can find additional information about the event 111 (provider Microsoft-Windows-EventForwarder) generated at 12:25:32? I'm not even sure that it is a real event. I suggest to look for it in System and Applications and Services Logs\Microsoft\Windows\Eventlog-ForwardingPlugin channels.

  • could you check in Event Viewer the creation time of the event 28897? If it was created before (or at) 12:25:32Z, that would validate my theory.

vruello avatar Aug 20 '25 16:08 vruello

Could you also share the content of the following channels between 12:25:00Z and 12:30:00Z?

  • Applications and Services Logs\Microsoft\Windows\Eventlog-ForwardingPlugin
  • Applications and Services Logs\Microsoft\Windows\Windows Remote Management

vruello avatar Aug 20 '25 21:08 vruello

I was able to reproduce the issue in one of my labs, with a Windows 11 Pro client that authenticates to openwec using TLS.

However, the problem does not seem to appear in my AD lab where I collect logs from Windows 2022 servers. I'll try to add a Windows 11 machine to the AD lab.

What OS does your Windows machine use?

vruello avatar Aug 21 '25 09:08 vruello

I am using AMI Windows_Server-2025-English-Full-Base-2025.07.09 in AWS. The linux machine' AMI is ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20250610 Let me know if you still need information you requested above. Also, great thanks for your effort with this issue!

semyonkozlov avatar Aug 21 '25 10:08 semyonkozlov

I have made some progress.

  • I could not reproduce the behavior in an AD environment. Everything seems to be working as expected, which is reassuring.

  • In the AD environment, openwec receives a 111 event the first time the subscription is created on the Windows client when no bookmark is sent (i.e. read_existing_event=false). It is the only event sent in the message, and its bookmark is:

<w:Bookmark><BookmarkList Direction=\"backward\"><Bookmark Channel=\"System\" RecordId=\"1407\" IsCurrent=\"true\"/></BookmarkList></w:Bookmark>

=> It has the same format seen in the TLS environment. The event content is also similar:

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-EventForwarder'/><EventID>111</EventID><TimeCreated SystemTime='2025-08-22T16:32:36.329Z'/><Computer>WIN11.WINDOMAIN.LOCAL</Computer></System><SubscriptionBookmarkEvent><SubscriptionId></SubscriptionId></SubscriptionBookmarkEvent></Event>

The node SubscriptionBookmarkEvent is unusual. I found some documentation about it in a Microsoft github repository (https://github.com/microsoft/AaronLocker/blob/main/AaronLocker/Get-AppLockerEvents.ps1#L882C3-L897C13):

  Event ID 111 with $_.ProviderName -eq "Microsoft-Windows-EventForwarder" indicates an artificial
    event created on the Windows event collector when a client system creates a subscription.
    Use it to identify systems that were able to forward events but didn't.
    Example event XML:
    <?xml version="1.0"?>
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
        <System>
        <Provider Name="Microsoft-Windows-EventForwarder"/>
        <EventID>111</EventID>
        <TimeCreated SystemTime="2018-03-02T21:04:42.797Z"/>
        <Computer>myworkstation.contoso.com</Computer>
        </System>
        <SubscriptionBookmarkEvent>
        <SubscriptionId/>
        </SubscriptionBookmarkEvent>
    </Event>
  • In the AD env, if read_existing_event=true, a "magic" bookmark is set in the EnumerateResponse message (instead of no bookmark):
<w:Bookmark>http://schemas.dmtf.org/wbem/wsman/1/wsman/bookmark/earliest</w:Bookmark>

In that case, openwec doesn't receive the 111 event.

  • In both labs, the emission of the 111 event seems to be correlated with the event Source: Eventlog-ForwardingPlugin; EventID: 100 (subscription created successfully) created in the channel Microsoft-Windows-Eventlog-ForwardingPlugin/Operational. However, in the TLS lab, the subscription is created "again" every time openwec is restarted, unlike in the AD lab where it is created only once.

  • The OpenWEC server sends almost the same XML regardless of the authentication method used. The only difference is the authentication policy.

  • I tried to ignore the bookmarks with a "Direction" attribute, but that blocked the events feed.

In conclusion, I think there is an issue with the EnumerateResponse that openwec sends to the client when TLS authentication is used. This issue causes the client to recreate the subscription, or at least ignore the bookmark. I couldn't find a meaningful event that would explain why this happens.

The next step is to set up a Windows WEC server in the TLS lab and check if the same behavior occurs.

vruello avatar Aug 22 '25 17:08 vruello

@vruello hello! We have setup with OpenWEC where TLS is required. Should we expect potential fix in the near future? Is there any additional info or help I can provide?

semyonkozlov avatar Aug 27 '25 16:08 semyonkozlov

I set up a lab with a Windows WEC server configured for TLS. It appears that the Windows implementation is functioning properly. Now, we have to find out why ;)

I'll let you know as soon as I make some progress.

vruello avatar Sep 01 '25 05:09 vruello