fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Add new tables for threat hunting and detection & response

Open noahtalerman opened this issue 1 year ago • 13 comments

Goal

User story
As a security engineer using Fleet for threat hunting and detection & response,
I want to see recent network traffic, files that have been opened recently, and patterns in recently changes files on a Windows workstations
so that I can identify malware and search for other malicious activity.

Key result

Expand data (via new tables) for detection & response and threat hunting

Original requests

  • #20946

Context

  • Product designer: @noahtalerman

Changes

Product

  • [ ] UI changes: No changes
  • [ ] CLI (fleetctl) usage changes: No changes
  • [ ] YAML changes: No changes
  • [ ] REST API changes: No changes
  • [ ] Fleet's agent (fleetd) changes: No changes
  • [ ] Osquery change:
    • Add the following tables:
      • dns_lookup_events
      • http_events
        • TODO: @zwass: not yet implemented
      • windows_recent_files
        • TODO: @zwass: not yet implemented
  • Add Windows support for yara_events
  • [ ] Activity changes: No changes
  • [ ] Permissions changes: No changes
  • [ ] Changes to paid features or tiers: Fleet Free and Fleet Premium
  • [ ] Other reference documentation changes: Document these tables in https://fleetdm.com/tables
  • [ ] Once shipped, requester has been notified
  • [ ] Once shipped, dogfooding issue has been filed

Engineering

  • [ ] Feature guide changes:
    • @jmwatts: the guide has this section: How do I turn on an evented table?. Where is this flag set?
      • Set it in agent options (command line flags) in Fleet, or manually configure the flag when running osqueryd directly
    • @jmwatts: The guide also says "and create a file that matches the pattern" but I don't know what pattern I'm looking for. I don't know how to create a YARA rule. How do I create a YARA rule? What are some examples?
  • [ ] Database schema migrations: No changes (osquery core)
  • [ ] Load testing: @zwass, can you think of how these changes may affect the server with regard to incoming results?
    • Not likely to cause issues on the Fleet server because we already handle high loads of query results from evented tables

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

@xpkoala, please go through this QA section and approve or raise questions.

  • Requires load testing: I would think this change has to be tested under stress. @zwass any recommendation how to do it? We typically test the server using the osquery test, but not the osquery itself.
  • Risk level: High
  • Risk description: Changes in areas of real time events collection is typically high risk. @zwass, please add/correct.

Manual testing steps

  • [ ] Follow this guide to turn on evened table. Generate network traffic and run a live query against dns_lookup_events and http_eventsand verify that the traffic shows up
  • [ ] Open a file on a Windows workstation and run a live query against windows_recent_files. Verify that the file opened shows up.
  • [ ] Configure YARA rules (learn how here), and create a file that matches the pattern. Then run a query against yara_events to verify that the file shows up. Create a file that doesn't match the pattern and verify that this file doesn't show up. See testing documentation in the PR
  • [ ] Agent (osqueryd) performance should be evaluated. For dns_lookup_events, generate lots of DNS requests (sometimes can be done with a web browser, but often they use caching or their own implementations of DNS so don't get picked up. The ping command seems to always generate DNS requests). For yara_events, create/modify lots of files in monitored directories.

Testing notes

Confirmation

  1. [ ] Engineer (@____): Added comment to user story confirming successful completion of QA.
  2. [ ] QA (@____): Added comment to user story confirming successful completion of QA.

noahtalerman avatar Nov 26 '24 23:11 noahtalerman

@noahtalerman I read through the guide mentioned in the test plan and see this section: How do I turn on an evented table? But it's not clear to me WHERE I set --disable_events=false Where is this flag set?

Also the link you included for YARA rules doesn't have enough info to set up YARA rules and use them, should I be using https://fleetdm.com/guides/remote-yara-rules instead? It also says "and create a file that matches the pattern" but I don't know what pattern I'm looking for. I don't know how to create a YARA rule. Are there specific files or YARA rules I should be testing with?

jmwatts avatar Feb 06 '25 18:02 jmwatts

@noahtalerman I read through the guide mentioned in the test plan and see this section: How do I turn on an evented table? But it's not clear to me WHERE I set --disable_events=false Where is this flag set?

@jmwatts really good find! Agreed it's unclear. I added this to the "Feature guide" section in the "Engineering" section.

Also the link you included for YARA rules doesn't have enough info to set up YARA rules and use them, should I be using https://fleetdm.com/guides/remote-yara-rules instead? It also says "and create a file that matches the pattern" but I don't know what pattern I'm looking for. I don't know how to create a YARA rule. Are there specific files or YARA rules I should be testing with?

Agreed the guide is a better set of instructions. The guide links to the YAML reference (link I linked to).

I also don't know how to create a YARA rule! Would be great to add that to the guide. I added this to the "Feature guide" section in the "Engineering" section.

cc @xpkoala @sharon-fdm

noahtalerman avatar Feb 07 '25 16:02 noahtalerman

@jmwatts , @noahtalerman , This short video shows how to configure FIM event. (Will not necessarily be the same for other events)

sharon-fdm avatar Feb 07 '25 16:02 sharon-fdm

  • [ ] Osquery change:
    • Add the following tables:
      • dns_lookup_events
        • TODO: @zwass: Add in columns here
      • http_events
        • TODO: @zwass: Add in columns here
      • windows_recent_files
        • TODO: @zwass: Add in columns here

Hey @zwass do we know what the columns will look like yet?

noahtalerman avatar Feb 13 '25 17:02 noahtalerman

  • [ ] Osquery change:
    • Add the following tables:
      • dns_lookup_events
        • TODO: @zwass: Add in columns here
      • http_events
        • TODO: @zwass: Add in columns here
      • windows_recent_files
        • TODO: @zwass: Add in columns here

Hey @zwass just following up, do we know what the columns will look like yet?

noahtalerman avatar Feb 19 '25 18:02 noahtalerman

  • [ ] Osquery change:
    • Add the following tables:
      • dns_lookup_events
        • TODO: @zwass: Add in columns here
      • http_events
        • TODO: @zwass: Add in columns here
      • windows_recent_files
        • TODO: @zwass: Add in columns here

Hey @zwass just giving you another ping! Do we know what the columns will look like yet?

noahtalerman avatar Mar 04 '25 22:03 noahtalerman

@sharon-fdm moved this to the release board. Can you help make sure this goes through QA once @zwass is done building these tables?

rachaelshaw avatar Mar 20 '25 14:03 rachaelshaw

@rachaelshaw, Sure!

@xpkoala, headsup that this will be in Awaiting QA soon.

sharon-fdm avatar Mar 20 '25 14:03 sharon-fdm

Hey @zwass @zayhanlon just checking, are we targeting shipping these tables in the next osquery release?

noahtalerman avatar Mar 26 '25 17:03 noahtalerman

@noahtalerman it should be - zach posted an async update today that osquery committee agreed to a release in the next 2 weeks ish.

zayhanlon avatar Mar 26 '25 17:03 zayhanlon

@xpkoala, I moved this to Awaiting QA. Not sure about where the PR is. Let's sync about this...

sharon-fdm avatar Mar 27 '25 11:03 sharon-fdm

@zwass, I added several question in the main body regarding how to test this. Mainly:

  • how will stress affect the osquery itself on the host
  • How will stressed hosts affect the server with regard to incoming results.

sharon-fdm avatar Mar 27 '25 12:03 sharon-fdm

yara_events and dns_lookup_events are both in PR and will ideally ship in osquery 5.17 in the next week or two.

@sharon-fdm can you please assign an engineer from Fleet to help with code review for each of those PRs? I discussed in osquery office hours and other folks in the community don't have time to review them currently. We should be able to get someone with the necessary permissions to make the approval if we've shown that we have done review.

zwass avatar Mar 27 '25 17:03 zwass

@sharon-fdm yara_events and dns_lookup_events are shipped in osquery 5.17.0 which is now on the edge channel. Can we please start QA on this?

zwass avatar Apr 16 '25 22:04 zwass

@sharon-fdm @xpkoala what is the status of QA on this? We are ready to declare osquery stable but I am holding off to hear your go-ahead.

zwass avatar Apr 29 '25 12:04 zwass

@zwass, Osquery 5.17 is now in QA priority after 4.67.0/1 are done (Status: 2 of 4 for 5.17 are signed off at this moment) @xpkoala, any ETA for 5.17 (and 24198 with it)?

sharon-fdm avatar Apr 29 '25 13:04 sharon-fdm

@zwass I'm focusing on testing this issue today. I'm planning on having it wrapped up by end of day today.

xpkoala avatar Apr 29 '25 16:04 xpkoala

@zwass A quick update. I met with @lucasmrod and we were able to make some progress on this but were not able to complete testing. Lucas and I are planning on meeting tomorrow morning to finish testing.

I also removed the items relating to http_events and windows_recent_files from the testing criteria.

xpkoala avatar Apr 29 '25 22:04 xpkoala

For yara_events, create/modify lots of files in monitored directories.

@zwass Here's our plan to test performance for this issue:

From top of my head I can think the following plan:

  • Configure ~5 different paths to monitor (file_paths).
  • Each path contains ~10 executables (average size) scattered into many sub-directories.
  • Configure ~10 yara files with different rules from https://github.com/Yara-Rules/rules.
  • Create a scheduled query that does SELECT * FROM yara_events; that runs every ~1 minute.
  • Enroll a Windows 10 device and a Windows 11 device.

Run the above for ~5 hours and make sure osquery watchdog did not killed the worker (that would indicate performance is reasonable).

I'm all ears if we want to adjust any numbers. (It may be too intensive or too light a test...)

lucasmrod avatar Apr 30 '25 19:04 lucasmrod

A few notes:

  • It shouldn't matter if the files are actually executables. Random data should be fine. Ideally randomly distributed file sizes from small to 100+MB
  • During the test we need to be triggering writes or modifications so that events are generated.
  • We should do some manual observation of osquery CPU and memory while the events are generated

Remember that with event-based tables, the heavy lifting is usually not when the query actually runs (and the events are pulled out of the rocksdb store), but when the events are ingested and processed.

zwass avatar Apr 30 '25 20:04 zwass

Thanks @zwass!

Here's the revised plan:

Setup

  • Configure ~5 different paths to monitor (file_paths).
  • Each path contains ~10 files (from small to 100 MB) scattered into many sub-directories (they don't need to be executables).
  • Configure ~10 yara files with different rules from https://github.com/Yara-Rules/rules.
  • Create a scheduled query that does SELECT * FROM yara_events; that runs every ~5 minutes.

Execute

  • Run the following for ~5 hours.
  • Enroll a Windows 10 device and a Windows 11 device.
  • During the test trigger writes or modifications so that events are generated.

Measure

  • Manual observation of osquery CPU and memory while the events are generated. I was able to configure "Performance Monitor" Windows (builtin) application to collect "Private Bytes" and "% Processor Time" of the two osqueryd processes, worker and watcher. [*]
  • Make sure the osquery watchdog process do not kill the worker process (that would indicate performance is reasonable). By grepping orbit-osquery.log for "Memory limits exceeded" and "Maximum sustainable CPU utilization limit".

[*]

Image

lucasmrod avatar May 01 '25 13:05 lucasmrod

Sounds great! One more thing I might add would be to make sure that you have at least one Yara roll that actually matches on some of the files. Potentially you could do that by making a Yara roll that matches on some piece of data that's likely to occur in the files. Or you can even intentionally add the data to some of the files. If you look at my pull request I showed how to make a yara rule that always matches and one that never matches. Those might be a good place to start.

zwass avatar May 01 '25 13:05 zwass

The above scenario was run on Windows 10 and Windows 11 for roughly two days over the weekend. There were 0 watchdog crashes to report and no other abnormal behavior was seen.

Giving a 👍 on this one for release.

xpkoala avatar May 05 '25 16:05 xpkoala

@xpkoala, how do we typically close osquery tickets? This one should be closed.

sharon-fdm avatar May 22 '25 14:05 sharon-fdm

@sharon-fdm Typically tickets are closed when a milestone is complete. It sounds like we aren't doing that with our osquery milestones. I can set this to closed, but we should consider using the same workflow as fleetd and fleet ticket closing flows.

xpkoala avatar May 22 '25 14:05 xpkoala

Right. Milestone closed.

sharon-fdm avatar May 22 '25 14:05 sharon-fdm

New tables bloom, guide Security engineers' sight, Malware flees the light.

fleet-release avatar May 22 '25 15:05 fleet-release

We haven't completed all the tables yet, so reopening this issue.

zwass avatar May 22 '25 16:05 zwass

Since this was re-opened I'm moving back to In Progress

lukeheath avatar May 22 '25 19:05 lukeheath

@zwass @sharon-fdm Did any part of this ship with osquery 5.17.0? If so, we should re-spec this to cover what did ship, keep it tagged to the 5.17.0 milestone, and close it. Then, create a new issue for the remaining work. That way we're tracking what made it into stable.

If nothing for this story was shipped with osquery 5.17.0, then we can just change the milestone to osquery-5.18.0. Up to @sharon-fdm.

lukeheath avatar May 22 '25 19:05 lukeheath