zeek
zeek copied to clipboard
Updates needed for software framework.
This is mostly notes for me to remember what to fix, but I see a few issues lately with the software framework, particularly related to http.
Azure versions
We have ignored_user_agents
for Browsers, but not for servers. There is a Microsoft cloud proxy thing that sets the version to the region/instance id, like so:
ECAcc (mil/6C98)
ECAcc (mil/6C22)
ECAcc (mil/6C40)
ECAcc (mil/6C60)
ECAcc (mil/6C28)
ECAcc (mil/6C45)
ECAcc (mil/6C22)
ECAcc (mil/6C45)
See https://learn.microsoft.com/en-us/azure/cdn/cdn-verizon-http-headers
Example Via request header
Via: HTTP/1.1 ECD (dca/1A2B)
This causes almost every single one of these requests to trigger a new HTTP::SERVER.
Proxy load
In a change I made a while ago, I moved the version parsing to the proxies, which did reduce the worker load quite a bit, but the software framework found
function still sends every found software up to the proxies. Something like this in found
could help:
if (info?$unparsed_version) {
if ([info$host, info$unparsed_version] in found_cache)
return T;
add found_cache[info$host, info$unparsed_version];
}
where found_cache
is a set[addr, string]
with create_expire set to something reasonable. It would be great if that could sync up with the
global tracked: table[addr] of SoftwareSet &create_expire=1day;
Multiple browsers
The software framework assumes that for each software type, a host has one and only version of that software type. This makes sense for things like ssh server, but now with things like electron apps and chrome/edge/safari it's not uncommon for a single host to be making multiple concurrent http requests with alternating user-agents. Or a host could be running two different http servers for two different API services. Every time the host flip-flops it triggers new software log entries that don't actually contain new information.
Looking on one network, 70% of the last 1,000,000 software log entries are duplicates.