fleet
fleet copied to clipboard
Host vitals: add softwares' hash (SHA-256) for macOS apps
Goal
| User story |
|---|
| As a Security Engineer, |
| I want to see the SHA-256 hash for each application executable on a macOS host |
| so that I can match against, and in some cases block, that application by hash in Santa. |
Key result
Customer request
Original requests
- #24210
Context
- Product designer: @noahtalerman
Changes
Product
- [ ] UI changes: Figma here
- [ ] CLI (fleetctl) usage changes: No changes
- [ ] YAML changes: No changes
- [ ] REST API changes: PR here
- [ ] Fleet's agent (fleetd) changes: Add a cdhash_sha256 column to the
codesignfleetd table. Looks like we would switch thecodesigncall from--verboseto-vvvwhen hash is requested, and revise parsing to extract both team identifier and hash. - [ ] Activity changes: No changes
- [ ] Permissions changes: No changes
- [ ] Changes to paid features or tiers: Fleet Free and Fleet Premium
- [ ] Transparency changes: No changes
- [x] First draft of test plan added
- [ ] Other reference documentation changes: No changes
- [ ] Once shipped, requester has been notified
- [ ] Once shipped, dogfooding issue has been filed
Engineering
- [ ] Add another variation to the macOS software vitals query that, if the column exists in
codesign, includes it. Otherwise fall back to the other two variants (one withcodesignavailable, one without). - [ ] Confirm that the above software vitals query doesn't get denylisted, even on machines with a bunch of software.
- [ ] Test plan is finalized
- [ ] Database schema migrations: Add
executable_sha256column tohost_software_installed_paths - [ ] Load testing: Yes. Need to confirm that software ingestion load isn't significantly affected by inclusion of hash, including when we're backfilling hashes (similar to the effects when we backfilled team identifier).
ℹ️ Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".
QA
Risk assessment
- Requires load testing: Yes
- Risk level: High
- Risk description: Need to make sure that vitals continue to work (without hashes) on vanilla osquery and older versions of fleetd/fleetd-tables, and need to make sure vitals queries don't get denylisted
Test plan
- [ ] UI: Head to a macOS hosts Host details page and, on the Software tab, select Actions > Show details for a macOS app (
"source": "apps"in theGET /hosts/:id/softwareAPI). Verify that the hash is presented for each version of the app installed. Confirm that the hash for the associated install path matches what we get fromcodesign. - [ ] API: Hit the
GET /hosts/:id/softwareAPI and verify that the newhash_sha256is included underinstalled_versions.signature_informationarray for each macOS app ("source": "apps") - [ ] Confirm that software vitals work properly for a vanilla osquery host (with no hash)
- [ ] Confirm that software vitals work properly for non-cask Homebrew packages (with no hash)
- [ ] Confirm that software vitals work properly for Linux and Windows hosts (with no hash)
- [ ] Confirm that software vitals work properly for a vanilla osquery host with the fleetd tables extension (WITH hash for apps)
- [ ] Confirm that software vitals work properly for older fleetd on macOS (with no hash)
Testing notes
Confirmation
- [ ] Engineer: Added comment to user story confirming successful completion of test plan.
- [ ] QA: Added comment to user story confirming successful completion of test plan.
@noahtalerman any chance we can still get these designs done in this sprint?
FYI @zayhanlon removed P2 from this because we want to reserve priority labels for urgent business needs.
In this case, we'll bring this story through expedited drafting, get it estimated, and plan on bringing it into the next sprint.
I think I suggested putting a P2 on it. Whoops from me.
Yup lol, ok no worries @noahtalerman
Just did some research on this and my guess is that the current framing of this ticket expects the hash value to be pulled from the hash table, which includes various hash types including sha256. Catch there is that that table will hash each file on the fly, which would basically guarantee low performance on the host while inventorying, let alone getting the query denylisted.
By contrast, the signature table is lighter-weight from my understanding, and with osquery 5.15 (specifically https://github.com/osquery/osquery/pull/8471) we can have that table only pull executable hash for better (potentially acceptable), which I believe is sufficient here. The apps part of that query would look something like this:
SELECT
name AS name,
cdhash AS hash,
COALESCE(NULLIF(bundle_short_version, ''), bundle_version) AS version,
bundle_identifier AS bundle_identifier,
'' AS extension_id,
'' AS browser,
'apps' AS source,
'' AS vendor,
last_opened_time AS last_opened_at,
apps.path AS installed_path
FROM apps
LEFT JOIN signature ON signature.path = apps.path AND signature.hash_executable = FALSE AND signature.hash_resources = FALSE;
(thx @allenhouchins for your machine being the target of the above live query, and @lucasmrod for the correction on the query)
The catch is that this hash is sha1 rather than sha256, so wouldn't work on this as spec'd. If we wanted the beefier hash we'd need to either modify osquery again or add columns to the codesign table that effectively port the hashing logic from C++ to Go, but with sha256 rather than sha1. So if sha1 is sufficient for the customer (e.g. if they're already using that table internally and just want it rolled up into software inventory) we should use that.
More context: https://github.com/fleetdm/confidential/issues/8750#issuecomment-2528257909
If cached sha1 is acceptable here, performance will be acceptable with that table, and we'll add another discovery query to make sure we're only pulling hash when we can do it efficiently. Given that this has been in osquery for a couple releases, I don't think we have to backport a more expensive version of this.
Confirmed with the customer that they need sha256 rather than sha1. Updated the story to take this into account. Actually looks like the codesign command supports this, and we already use it in the codesign table, so as long as performance is reasonable this shouldn't be as large of a lift as I thought for changes that will require fleetd work.
Sending out estimates to backend folks for this as a full-stack task since the frontend changes for this are trivial.
Hey team! Please add your planning poker estimate with Zenhub @jahzielv @ksykulev
How do we plan to test this at scale, since osquery-perf doesn't actually run queries?
@jahzielv We'll need to tweak osquery-perf to include hashes on software vitals for the apps table, potentially behind a flag so we can simulate fleetd or vanilla. We don't need to test the real queries at scale, but would need to test on a few real machines to make sure swapping codesign usages doesn't get us denylisted.
sounds like we may need to test this query (via live query?) in order to determine if the software query will get denylisted. If there are problems, this estimate will likely increase greatly to accommodate for finding alternatives.
@mostlikelee Yes, once we have the fleetd-tables changes (which look quick) are live. I'm cautiously optimistic but if we can't do this efficiently with the codesign utility this will wind up having the remaining points consumed with a research task figuring out what we could use instead.
pushing this to the 4.69.0 release in order to increase test surface. We want to have a higher confidence that the sha queries do not get denylisted resulting in missing software hashes.
holding until after 1.42 is released
osquery uses a watchdog style system to dynamically add queries to the denylist. Here are the default limits at the time of posting this comment
https://github.com/osquery/osquery/blob/master/osquery/core/watcher.cpp#L74-L93
const WatchdogLimitMap kWatchdogLimits = {
// Maximum MB worker can privately allocate.
{WatchdogLimitType::MEMORY_LIMIT, {200, 100, 10000}},
// % of (User + System) CPU time worker can utilize
// for LATENCY_LIMIT seconds.
{WatchdogLimitType::UTILIZATION_LIMIT, {10, 5, 100}},
// Number of seconds the worker should run, else consider the exit fatal.
{WatchdogLimitType::RESPAWN_LIMIT, {4, 4, 1000}},
// If the worker respawns too quickly, backoff on creating additional.
{WatchdogLimitType::RESPAWN_DELAY, {5, 5, 1}},
// Seconds of tolerable UTILIZATION_LIMIT sustained latency.
{WatchdogLimitType::LATENCY_LIMIT, {12, 6, 1000}},
// How often to poll for performance limit violations.
{WatchdogLimitType::INTERVAL, {3, 3, 3}},
};
I don't know if our clients override these or not, but my guess is we can probably test this with the default values.
I did some performance testing and validation in the original PR https://github.com/fleetdm/fleet/pull/28574#issuecomment-2887288711 & https://github.com/fleetdm/fleet/pull/28574#issuecomment-2887817339). Will open a new PR this week to get the changes in.
For future changes to codesign and performance testing, here is the script I used to generate 1000 apps in /Applications
#!/bin/bash
# Number of fake apps to generate
NUM_APPS=1000
TARGET_DIR="/Applications"
if [[ $EUID -ne 0 ]]; then
echo "Please run as root: sudo $0"
exit 1
fi
echo "Generating and signing $NUM_APPS fake applications in $TARGET_DIR..."
for i in $(seq 1 $NUM_APPS); do
APP_NAME="FakeApp${i}.app"
APP_PATH="${TARGET_DIR}/${APP_NAME}"
BIN_PATH="${APP_PATH}/Contents/MacOS/FakeApp${i}"
# App bundle structure
mkdir -p "${APP_PATH}/Contents/MacOS"
mkdir -p "${APP_PATH}/Contents/Resources"
# Dummy binary
echo -e "#!/bin/bash\necho \"Running FakeApp${i}\"" > "$BIN_PATH"
chmod +x "$BIN_PATH"
# Info.plist
cat > "${APP_PATH}/Contents/Info.plist" <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CFBundleName</key>
<string>FakeApp${i}</string>
<key>CFBundleIdentifier</key>
<string>com.fake.FakeApp${i}</string>
<key>CFBundleExecutable</key>
<string>FakeApp${i}</string>
<key>CFBundlePackageType</key>
<string>APPL</string>
<key>CFBundleVersion</key>
<string>1.0</string>
</dict>
</plist>
EOF
# Sign the app with an ad-hoc identity
codesign --force --sign - --deep --timestamp=none "${APP_PATH}" > /dev/null 2>&1
done
echo "Done. $NUM_APPS signed fake apps created in $TARGET_DIR."
@ksykulev I am following the test steps but I'm not seeing the hash value for this software:
Am I looking in the wrong place?
Note: I am using fleetd 1.42
@jmwatts You'll need to use local TUF for this and build a new fleetd.
@iansltx should this be tagged with fleetd 1.43.0 or is there a separate ticket for that?
@jmwatts This should be split into detail query (4.69) and fleetd (1.43) subtasks.
@iansltx I used local TUF to build a new fleetd but I'm still not seeing it show up in the Host >> Software >> Show details.
I can see the new column when I run a query on the codesign table:
So it seems like I am able to access that value, but it's not populating in the UI:
@jmwatts Do we get the response for this in the API? Trying to figure out what all we still need to build here; may be missing a piece here.
Yep, it's in the API response
QA Notes
UI: Head to a macOS hosts Host details page and, on the Software tab, select Actions > Show details for a macOS app ("source": "apps" in the GET /hosts/:id/software API). Verify that the hash is presented for each version of the app installed. - [x] Confirm that the hash for the associated install path matches what we get from codesign.
-
[x] API: Hit the GET /hosts/:id/software API and verify that the new hash_sha256 is included under installed_versions.signature_information array for each macOS app ("source": "apps")
-
[x] Confirm that software vitals work properly for a vanilla osquery host (with no hash)
-
[x] Confirm that software vitals work properly for non-cask Homebrew packages (with no hash)
-
[x] Confirm that software vitals work properly for Linux and Windows hosts (with no hash)
-
[x] Confirm that software vitals work properly for a vanilla osquery host with the fleetd tables extension (WITH hash for apps)
-
[x] Confirm that software vitals work properly for 1.42.0 fleetd on macOS (with no hash)
NOTE: Vitals refetch is working for the above scenarios, but viewing the software details is broken #29513
@noahtalerman @eugkuo @lukeheath @zayhanlon Found an issue that we did not account for multiple install paths. Moving this to expedited drafting to redesign the UI to account for this.
@mostlikelee can you outline the issue in the ticket and what design needs are required so that we can address them?
cc @RachelElysia
@eugkuo Short version: Hash is per installed path, not per version; each installed path can have a different hash. Admins will want to be able to copy this hash so they can paste it into other security tooling.
@iansltx @RachelElysia @mostlikelee
Okee. Talked this over with @mostlikelee and have updated the design to reflect multiple paths and shas within a single version.
@jmwatts since this is going through expedited drafting, can you review the design changes to make sure it makes sense?
@mostlikelee Aesthetically: The new design looks... cluttered to me. I think it's because the values are in-line with the headers. It's also missing an example of software with associated install info. Screenshots for comparison:
Now:
New design:
Now with tabs, includes install details:
Functionally: Will it scroll within the modal if somehow there are tons of installed versions? Or will it just be a modal that gets longer the more versions there are? Other than that, it "makes sense".