zeek-community-id icon indicating copy to clipboard operation
zeek-community-id copied to clipboard

Feature Request: Add community_id to all network log types

Open dcode opened this issue 5 years ago • 12 comments

If any log has 5-tuple information, it should contain the community_id field for correlation across data types. As it stands today, one lookup has to find the conn entry, and another lookup to find related logs.

dcode avatar Oct 17 '18 20:10 dcode

Sorry for the late response here @dcode — yes, that is true. This was actually our original idea, until we noticed that Zeek currently lacks a good way to generalize log modifications to a set of applicable logs. You'd currently need to redef each applicable log's Info record. That's certainly doable but not very elegant.

Out of curiosity, in which log would you most like to have this?

ckreibich avatar Nov 28 '18 20:11 ckreibich

That's not entirely true. Zeek's logging framework does have a generalized mechanism. The current limitation is that when the mechanism is used to extend all logs, you can't inspect the content of the log in question in order to decide what you put in the extension field.

If we solved that problem, this would be easily doable.

sethhall avatar Nov 30 '18 14:11 sethhall

What mechanism do you mean? In filters or the writer?

ckreibich avatar Nov 30 '18 18:11 ckreibich

There is a feature in the logging framework to create log extension fields which are globally applied across all logs. Some common things people use it for are to add the worker that wrote the log line and the timestamp when the log line was written. The unfortunate part and what mostly limits it's utility is that you can't access the record being logged in the callback function that this feature is implemented through. If you could access the logged record in some way, you could inspect it to see if it has an "id" field and check it's type and then do some extended informational logging (like writing the community_id).

Here is the function prototype for the log extension mechanism: https://github.com/bro/bro/blob/master/scripts/base/frameworks/logging/main.bro#L150

Here is a short example using it globally...

type LogExtension: record {
        path:   string &log;
        system_name:   string &log;
        write_ts: time   &log;
};

function add_log_extension(path: string): LogExtension
        {
        return LogExtension($path        = path,
                                     $system_name = peer_description,
                                     $write_ts    = network_time());
        }
redef Log::default_ext_prefix = "_";
redef Log::default_ext_func = add_log_extension;

Ideally, the Log::default_ext_func function would have a second argument that is an anonymous record and Zeek would give you the ability to inspect anonymous records.

sethhall avatar Dec 05 '18 13:12 sethhall

@sethhall, I've used that log extension before and that's the first thing I thought of, but without the log record, it doesn't fit this usecase.

dcode avatar Dec 05 '18 16:12 dcode

Ah, right! I now remember reading over this and going "huh", but it was too early in my logging framework career. :smile: Thanks for the cluebatting! This seems to have a deficiency in that there's no particularly graceful way of handling the presence of multiple such global functions, but that will be easy to fix. It's a pity though that this moves pretty far from the type-oriented extension mechanism (via redef) that we have elsewhere ... ideally I'd want a conditional redef for adding to the Info record, depending on what else has been added to it. The fact that this is about logging-related records would then become secondary. We tried if ( record_fields(x) ) — it's nearly there but doesn't quite work right atm. Definitely fun stuff!

ckreibich avatar Dec 05 '18 18:12 ckreibich

@dcode Yeah, the original intent of the log extension mechanism was to figure out a way to get access to the log record but we couldn't do it at the time. We might be able to revisit that now with some new features that have been added to Zeek.

sethhall avatar Dec 11 '18 11:12 sethhall

Hello there!

Was wondering what the status of this issue is? Any progress?

defensivedepth avatar Mar 30 '20 16:03 defensivedepth

I'd still like to have a way to do this in a controlled yet general way in the logging framework. But others have put in the elbow grease to do it manually for all of Zeek's logs, see here if you prefer that: https://github.com/DynamiteAI/publish-community_id

ckreibich avatar May 14 '20 22:05 ckreibich

@ckreibich Has anything changed since your last comment in 2020? Thanks!

dougburks avatar Jul 12 '24 10:07 dougburks

I'm afraid no. There are two potential approaches, and we've not found the time for either:

  • Make conn_id extensible so the ID could become part of it, and thus automatically appear in the logs
  • Give the ext_func approach more context about the log it is extending, so it can tuck the ID on where needed.

Duly noted though that folks still want to see this capability. :+1:

ckreibich avatar Jul 12 '24 18:07 ckreibich

@dougburks would piping the logs through an external tool that adds the extra column be an option?

mavam avatar Jul 13 '24 14:07 mavam

@mavam For logs that don't already have community_id, we can enrich them using Elastic's community_id processor. However, in some cases, that processor doesn't find all of the information it's looking for (like network transport) and so it doesn't calculate a community_id value. So we're going back to first principles and examining our entire pipeline to see if there are improvements we can make to Zeek or Elastic to improve our community_id coverage.

dougburks avatar Jul 15 '24 10:07 dougburks

Okay, so the community_id processor from Elastic would do the trick if it could use the right protocol, like tcp, udp, icmp, etc.? It sounds like that this information is only available in conn.log and that otherwise you'd have to guess it based off the log type (e.g., tcp for http.log because others don't make sense). I'm not sure if you can express this log-type dispatching with Elastic though.

mavam avatar Jul 15 '24 11:07 mavam

@mavam Yes, that's correct. We've considered updating our Elastic ingest parsers to set protocol where necessary, but from an overall architecture perspective it feels like all of this really should happen at the Zeek level.

@ckreibich Given the two potential approaches you outlined above, would it be possible for us to sponsor your time to make this happen? If so, who would I talk to about that?

dougburks avatar Jul 16 '24 10:07 dougburks

Doug I don't think you can really sponsor my time for this, but you also don't have to — we're about to plan content for the 7.1 release and knowing that this is so desirable for you certainly matters.

I'd also like to understand this a bit better. Is the main reason you'd like to have this convenience (i.e., doing away with the need to pivot via conn.log), or is there more? Convenience is certainly valid, but I'm trying to understand if there's a use case where the ID has to be in more logs.

ckreibich avatar Jul 19 '24 17:07 ckreibich

@ckreibich That's good news! Thanks for being willing to look into this!

Avoiding the pivot via conn.log is definitely a valid concern, although personally I'd classify that as more than just a convenience. As a threat hunter or incident responder, if I'm constantly having to do 2 pivots all day long then it's naturally slowing me down and limiting the number of bad guys I can catch.

Another point to consider is that we give our users the option to use either Zeek or Suricata for metadata. When using Suricata for metadata, all of the metadata logs automatically contain community_id (without pivoting to another log). For folks comparing Zeek and Suricata to determine which they want to use for metadata, it may be somewhat surprising that Zeek doesn't have any easy option for this today...especially considering who developed community_id. :smiley: Implementing this feature would help to level that playing field.

Thanks again for your consideration!

dougburks avatar Jul 19 '24 18:07 dougburks

That's the reason this has been requested by my users, is just cutting down that extra step in pivoting: pivoting between Zeek logs but also pivoting between Zeek logs and other tools' logs (for example, today to pivot from zeek's http.log to the corresponding Arkime session, it's http.log -> conn.log -> arkime session, when we could cut out that middle step). Not every user would want to add community_id to all the logs, but I certainly know some that would.

mmguero avatar Jul 19 '24 20:07 mmguero

Thanks folks, got it!

ckreibich avatar Jul 19 '24 20:07 ckreibich