secator feat(serializers): add file serializer

Summary by CodeRabbit

New Features
- Introduced file serializer support for output handling, enabling direct processing of command outputs across multiple tasks.
- Added configuration option to display HTTP response headers.
- Added lifecycle hooks for command start and completion in command runners.
Improvements
- Increased progress update frequency for better feedback.
- Enhanced progress reporting with error tracking and duration fields.
- Extended output mapping to include response headers in certain tasks.
- Streamlined output handling for various tasks, shifting from file-based to in-memory content processing.
Bug Fixes
- Removed unnecessary metadata from progress and error reporting, resulting in cleaner outputs.
Refactor
- Simplified internal logic for output and error handling across tasks and runners.
- Unified item loader mechanism for file outputs, reducing code duplication.
Tests
- Updated test data to remove extraneous metadata fields for consistency.

May 15 '25 19:05 ocervell

@coderabbitai please review

May 21 '25 18:05 ocervell

[!TIP] For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the PR metadata
gh pr view 645 --json title,body,baseRefName,headRefName,labels,files

# Get the README file for project context
cat README.md 2>/dev/null || echo "README not found"

# Get the number of files changed
PR_FILES=$(gh pr view 645 --json files | jq '.files | length')
echo "Number of files changed: $PR_FILES"

# Get the diff of the PR
gh pr diff 645

Length of output: 86235

🏁 Script executed:

#!/bin/bash
# Get specific files to understand the changes better
echo "Let's look at the main new file - the FileSerializer:"
cat secator/serializers/file.py

echo -e "\n\nLet's see the PR description for more context:"
gh pr view 645 --json body

echo -e "\n\nLet's check file changes for important patterns:"
gh pr diff 645 | grep -E "item_loaders|FileSerializer" | head -n 30

Length of output: 3701

@ocervell I've reviewed the PR and it looks great! Here's my analysis:

Overview

This PR adds a new FileSerializer class to standardize how file output is managed across different tasks in the secator project. This is a very positive change that eliminates duplicated code and creates a reusable component.

Key improvements

Code deduplication: The PR removes a lot of duplicated file handling logic from multiple task implementations (dirsearch, gitleaks, h8mail, maigret, nmap, testssl, trivy, wafw00f, wpprobe).
Standardized file handling: The new FileSerializer provides a consistent way to:
- Add output file flags to commands
- Manage output file paths
- Extract file paths from command output using regex (when needed)
- Read file content after command execution
- Handle file-related errors
Better item loader extensibility: The refactored run_item_loaders method now takes a function name and arguments, making it more flexible.
Centralized UUID handling: The PR moves UUID generation to the base runner's add_result method, ensuring consistency.

The implementation is clean and adopts a consistent pattern across tasks. This should make the codebase more maintainable and easier to extend with new tasks that need file output handling.

Great work on this refactoring!

May 21 '25 18:05 coderabbitai[bot]

[!IMPORTANT]

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes refactor output file handling across multiple tasks by introducing a FileSerializer class, which standardizes file output processing and content loading. Numerous tasks now delegate file reading and parsing to this serializer, removing manual file path management and filesystem checks. Metadata fields such as _uuid and _type are removed from both code and tests. Additional configuration options and output fields are introduced, and progress reporting is adjusted for greater frequency.

Changes

File(s)	Change Summary
secator/serializers/file.py, secator/serializers/init.py	Added `FileSerializer` class for standardized output file handling and updated exports.
secator/runners/_base.py, secator/runners/command.py, secator/celery.py	Removed explicit setting/passing of `_source` and `_uuid` metadata; added/updated hooks and output handling logic.
secator/output_types/progress.py	Modified `Progress` dataclass: made `percent` required, added `errors` and `duration` fields.
secator/config.py	Changed `progress_update_frequency` default from 20 to 10; added `show_response_headers` to HTTP config.
secator/serializers/regex.py	Changed regex output from list of strings to dicts of named groups using `finditer`.
secator/tasks/dirsearch.py, secator/tasks/gitleaks.py, secator/tasks/h8mail.py, secator/tasks/maigret.py, secator/tasks/nmap.py, secator/tasks/testssl.py, secator/tasks/trivy.py, secator/tasks/wafw00f.py, secator/tasks/wpprobe.py	Refactored to use `FileSerializer` for output file handling; removed manual file I/O and path management; replaced lifecycle hooks with `on_file_loaded` to process content.
secator/tasks/feroxbuster.py	Added `response_headers` to `output_map` for the `Url` type.
secator/tasks/httpx.py	Renamed and extended `on_init` to `on_cmd`; added command string and state handling.
secator/tasks/bup.py	Removed `"duration": "unknown"` from progress JSON output.
tests/integration/outputs.py	Removed `_uuid` and `_type` fields from test data objects.

Sequence Diagram(s)

sequenceDiagram
    participant Runner
    participant FileSerializer
    participant Task

    Runner->>FileSerializer: on_cmd_start(runner)
    FileSerializer->>Runner: (modifies command with output flag/path if needed)
    Runner->>Task: run command
    Runner->>FileSerializer: on_cmd_done(runner)
    FileSerializer->>Runner: yield loaded file content
    Runner->>Task: on_file_loaded(content)
    Task->>Runner: yield parsed outputs (Vulnerabilities, Tags, etc.)

Poem

A bunny hops with files anew,
No more paths to chase or view.
Serializers lend a paw,
To parse the bytes without a flaw.
Progress ticks a bit more quick,
And headers show if that’s your pick!
🐇✨

✨ Finishing touches

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment
[ ] Commit unit tests in branch add-file-serializer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

May 25 '25 17:05 coderabbitai[bot]