feat(serializers): add file serializer
Summary by CodeRabbit
-
New Features
- Introduced file serializer support for output handling, enabling direct processing of command outputs across multiple tasks.
- Added configuration option to display HTTP response headers.
- Added lifecycle hooks for command start and completion in command runners.
-
Improvements
- Increased progress update frequency for better feedback.
- Enhanced progress reporting with error tracking and duration fields.
- Extended output mapping to include response headers in certain tasks.
- Streamlined output handling for various tasks, shifting from file-based to in-memory content processing.
-
Bug Fixes
- Removed unnecessary metadata from progress and error reporting, resulting in cleaner outputs.
-
Refactor
- Simplified internal logic for output and error handling across tasks and runners.
- Unified item loader mechanism for file outputs, reducing code duplication.
-
Tests
- Updated test data to remove extraneous metadata fields for consistency.
@coderabbitai please review
[!TIP] For best results, initiate chat on the files or code changes.
π§© Analysis chain
π Script executed:
#!/bin/bash
# Get the PR metadata
gh pr view 645 --json title,body,baseRefName,headRefName,labels,files
# Get the README file for project context
cat README.md 2>/dev/null || echo "README not found"
# Get the number of files changed
PR_FILES=$(gh pr view 645 --json files | jq '.files | length')
echo "Number of files changed: $PR_FILES"
# Get the diff of the PR
gh pr diff 645
Length of output: 86235
π Script executed:
#!/bin/bash
# Get specific files to understand the changes better
echo "Let's look at the main new file - the FileSerializer:"
cat secator/serializers/file.py
echo -e "\n\nLet's see the PR description for more context:"
gh pr view 645 --json body
echo -e "\n\nLet's check file changes for important patterns:"
gh pr diff 645 | grep -E "item_loaders|FileSerializer" | head -n 30
Length of output: 3701
@ocervell I've reviewed the PR and it looks great! Here's my analysis:
Overview
This PR adds a new FileSerializer class to standardize how file output is managed across different tasks in the secator project. This is a very positive change that eliminates duplicated code and creates a reusable component.
Key improvements
-
Code deduplication: The PR removes a lot of duplicated file handling logic from multiple task implementations (dirsearch, gitleaks, h8mail, maigret, nmap, testssl, trivy, wafw00f, wpprobe).
-
Standardized file handling: The new
FileSerializerprovides a consistent way to:- Add output file flags to commands
- Manage output file paths
- Extract file paths from command output using regex (when needed)
- Read file content after command execution
- Handle file-related errors
-
Better item loader extensibility: The refactored
run_item_loadersmethod now takes a function name and arguments, making it more flexible. -
Centralized UUID handling: The PR moves UUID generation to the base runner's
add_resultmethod, ensuring consistency.
The implementation is clean and adopts a consistent pattern across tasks. This should make the codebase more maintainable and easier to extend with new tasks that need file output handling.
Great work on this refactoring!
[!IMPORTANT]
Review skipped
Auto incremental reviews are disabled on this repository.
Please check the settings in the CodeRabbit UI or the
.coderabbit.yamlfile in this repository. To trigger a single review, invoke the@coderabbitai reviewcommand.You can disable this status message by setting the
reviews.review_statustofalsein the CodeRabbit configuration file.
Walkthrough
The changes refactor output file handling across multiple tasks by introducing a FileSerializer class, which standardizes file output processing and content loading. Numerous tasks now delegate file reading and parsing to this serializer, removing manual file path management and filesystem checks. Metadata fields such as _uuid and _type are removed from both code and tests. Additional configuration options and output fields are introduced, and progress reporting is adjusted for greater frequency.
Changes
| File(s) | Change Summary |
|---|---|
| secator/serializers/file.py, secator/serializers/init.py | Added FileSerializer class for standardized output file handling and updated exports. |
| secator/runners/_base.py, secator/runners/command.py, secator/celery.py | Removed explicit setting/passing of _source and _uuid metadata; added/updated hooks and output handling logic. |
| secator/output_types/progress.py | Modified Progress dataclass: made percent required, added errors and duration fields. |
| secator/config.py | Changed progress_update_frequency default from 20 to 10; added show_response_headers to HTTP config. |
| secator/serializers/regex.py | Changed regex output from list of strings to dicts of named groups using finditer. |
| secator/tasks/dirsearch.py, secator/tasks/gitleaks.py, secator/tasks/h8mail.py, secator/tasks/maigret.py, secator/tasks/nmap.py, secator/tasks/testssl.py, secator/tasks/trivy.py, secator/tasks/wafw00f.py, secator/tasks/wpprobe.py | Refactored to use FileSerializer for output file handling; removed manual file I/O and path management; replaced lifecycle hooks with on_file_loaded to process content. |
| secator/tasks/feroxbuster.py | Added response_headers to output_map for the Url type. |
| secator/tasks/httpx.py | Renamed and extended on_init to on_cmd; added command string and state handling. |
| secator/tasks/bup.py | Removed "duration": "unknown" from progress JSON output. |
| tests/integration/outputs.py | Removed _uuid and _type fields from test data objects. |
Sequence Diagram(s)
sequenceDiagram
participant Runner
participant FileSerializer
participant Task
Runner->>FileSerializer: on_cmd_start(runner)
FileSerializer->>Runner: (modifies command with output flag/path if needed)
Runner->>Task: run command
Runner->>FileSerializer: on_cmd_done(runner)
FileSerializer->>Runner: yield loaded file content
Runner->>Task: on_file_loaded(content)
Task->>Runner: yield parsed outputs (Vulnerabilities, Tags, etc.)
Poem
A bunny hops with files anew,
No more paths to chase or view.
Serializers lend a paw,
To parse the bytes without a flaw.
Progress ticks a bit more quick,
And headers show if thatβs your pick!
πβ¨
β¨ Finishing touches
π§ͺ Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
add-file-serializer
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.