Feature: NDJSON Support for Hawk Log Output
What problem would this feature solve?
Currently, Hawk outputs JSON logs in standard JSON format, which is human-readable but not optimized for SIEM ingestion. NDJSON (Newline Delimited JSON) offers performance and efficiency benefits, including:
- Faster write speeds due to streaming each log entry as a separate JSON object.
- Smaller file sizes due to reduced whitespace and formatting overhead.
- Optimized ingestion for SIEMs like Splunk, ElasticSearch, Microsoft Sentinel, and others that prefer NDJSON for bulk ingestion.
The primary decision point is whether NDJSON formatting should be handled within Hawk or delegated to HawkEye, which is responsible for SIEM ingestion. This ticket tracks the discussion and potential implementation within Hawk.
Proposed Solution
Introduce NDJSON support in Hawk's logging mechanism as an optional feature. This allows users to choose between traditional JSON and NDJSON without forcing a format change.
Options for Implementation:
-
Hawk generates NDJSON by default
- All JSON logs are formatted as NDJSON.
- Users must convert back to standard JSON if needed.
- Simplifies ingestion for SIEMs but may require updates for users parsing logs manually.
-
Hawk provides an option for output type, which can be one or multiple output types (JSON, NDJSON, CSV) output
- Introduce a command-line switch (
-OutputType) forStart-HawkUserInvestigation&Start-HawkTenantInvestigationand all public Tenant / User functions. - Users can choose output type on a per-investigation / per-function basis.
- Backward-compatible and allows gradual adoption.
- Introduce a command-line switch (
-
Hawk produces both JSON and NDJSON
- Hawk will output both JSON and CSV as usual, and also add NDJSON output.
-
Keep NDJSON conversion in HawkEye
- Hawk continues outputting standard JSON, and HawkEye transforms it into NDJSON for ingestion.
- Reduces complexity in Hawk but offloads work to HawkEye.
The team should discuss which approach aligns best with Hawk’s long-term vision.
⚙️ Developer Section (For Hawk Team Members Only)
Technical Requirements
- Modify
Out-MultipleFileType.ps1to support NDJSON output. - Ensure UTF-8 encoding is maintained.
- Decide on appropriate file extension for NDJSON (
.ndjsonor.jsonl). - Maintain backward compatibility with existing scripts that process JSON.
Implementation Approach
- Introduce a new parameter (
-ndjson) to toggle NDJSON output. - Use
ConvertTo-Json -Compressand write each object as a separate line. - Validate that Splunk, ELK, and other SIEMs ingest the NDJSON format correctly.
- Update documentation to reflect changes.
Acceptance Criteria
- Hawk successfully writes logs in NDJSON format when enabled.
- NDJSON files are smaller and more efficiently ingested into SIEMs.
- Users can still output traditional JSON if needed.
- Performance benchmarks show improved write speeds and reduced memory overhead.
- No breaking changes to existing functionality.
This ticket will remain open for discussion until the team reaches consensus on whether NDJSON should be implemented in Hawk or left to HawkEye.
It's a super useful feature. We were planning to use hawk to ingest into our ELK stack, but we have to make some foo to convert to ndjson. It would be cool to have it natively.
@Guzzy711 thank you for your feedback! In terms of implementation options, would you prefer option 1, 2, or 3? Based upon four feedback and a discussion with some of the Hawk contributors, we will look to implement this in our next minor release.
@Guzzy711, we would also be interested in hearing any pain points, suggestions, and any feedback in general as you begin ingesting the Hawk data into ELK. Thanks again for your feedback on this ticket!
I think option 2 would be preferable for the wider community; however, it probably also requires a bit more work. :-)
@Guzzy711, we would also be interested in hearing any pain points, suggestions, and any feedback in general as you begin ingesting the Hawk data into ELK. Thanks again for your feedback on this ticket!
For sure! Will definitely let you know. 👍🏽
Maybe you can get inspired by the following to do the conversion: https://www.blackhillsinfosec.com/wrangling-the-m365-ual-part-3-of-3/
@Guzzy711 , we are rolling with option 2. Starting some work on it this weekend!