secator icon indicating copy to clipboard operation
secator copied to clipboard

feat(serializers): add file serializer

Open ocervell opened this issue 7 months ago β€’ 3 comments

Summary by CodeRabbit

  • New Features

    • Introduced file serializer support for output handling, enabling direct processing of command outputs across multiple tasks.
    • Added configuration option to display HTTP response headers.
    • Added lifecycle hooks for command start and completion in command runners.
  • Improvements

    • Increased progress update frequency for better feedback.
    • Enhanced progress reporting with error tracking and duration fields.
    • Extended output mapping to include response headers in certain tasks.
    • Streamlined output handling for various tasks, shifting from file-based to in-memory content processing.
  • Bug Fixes

    • Removed unnecessary metadata from progress and error reporting, resulting in cleaner outputs.
  • Refactor

    • Simplified internal logic for output and error handling across tasks and runners.
    • Unified item loader mechanism for file outputs, reducing code duplication.
  • Tests

    • Updated test data to remove extraneous metadata fields for consistency.

ocervell avatar May 15 '25 19:05 ocervell

@coderabbitai please review

ocervell avatar May 21 '25 18:05 ocervell

[!TIP] For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the PR metadata
gh pr view 645 --json title,body,baseRefName,headRefName,labels,files

# Get the README file for project context
cat README.md 2>/dev/null || echo "README not found"

# Get the number of files changed
PR_FILES=$(gh pr view 645 --json files | jq '.files | length')
echo "Number of files changed: $PR_FILES"

# Get the diff of the PR
gh pr diff 645

Length of output: 86235


🏁 Script executed:

#!/bin/bash
# Get specific files to understand the changes better
echo "Let's look at the main new file - the FileSerializer:"
cat secator/serializers/file.py

echo -e "\n\nLet's see the PR description for more context:"
gh pr view 645 --json body

echo -e "\n\nLet's check file changes for important patterns:"
gh pr diff 645 | grep -E "item_loaders|FileSerializer" | head -n 30

Length of output: 3701


@ocervell I've reviewed the PR and it looks great! Here's my analysis:

Overview

This PR adds a new FileSerializer class to standardize how file output is managed across different tasks in the secator project. This is a very positive change that eliminates duplicated code and creates a reusable component.

Key improvements

  1. Code deduplication: The PR removes a lot of duplicated file handling logic from multiple task implementations (dirsearch, gitleaks, h8mail, maigret, nmap, testssl, trivy, wafw00f, wpprobe).

  2. Standardized file handling: The new FileSerializer provides a consistent way to:

    • Add output file flags to commands
    • Manage output file paths
    • Extract file paths from command output using regex (when needed)
    • Read file content after command execution
    • Handle file-related errors
  3. Better item loader extensibility: The refactored run_item_loaders method now takes a function name and arguments, making it more flexible.

  4. Centralized UUID handling: The PR moves UUID generation to the base runner's add_result method, ensuring consistency.

The implementation is clean and adopts a consistent pattern across tasks. This should make the codebase more maintainable and easier to extend with new tasks that need file output handling.

Great work on this refactoring!

coderabbitai[bot] avatar May 21 '25 18:05 coderabbitai[bot]

[!IMPORTANT]

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes refactor output file handling across multiple tasks by introducing a FileSerializer class, which standardizes file output processing and content loading. Numerous tasks now delegate file reading and parsing to this serializer, removing manual file path management and filesystem checks. Metadata fields such as _uuid and _type are removed from both code and tests. Additional configuration options and output fields are introduced, and progress reporting is adjusted for greater frequency.

Changes

File(s) Change Summary
secator/serializers/file.py, secator/serializers/init.py Added FileSerializer class for standardized output file handling and updated exports.
secator/runners/_base.py, secator/runners/command.py, secator/celery.py Removed explicit setting/passing of _source and _uuid metadata; added/updated hooks and output handling logic.
secator/output_types/progress.py Modified Progress dataclass: made percent required, added errors and duration fields.
secator/config.py Changed progress_update_frequency default from 20 to 10; added show_response_headers to HTTP config.
secator/serializers/regex.py Changed regex output from list of strings to dicts of named groups using finditer.
secator/tasks/dirsearch.py, secator/tasks/gitleaks.py, secator/tasks/h8mail.py, secator/tasks/maigret.py, secator/tasks/nmap.py, secator/tasks/testssl.py, secator/tasks/trivy.py, secator/tasks/wafw00f.py, secator/tasks/wpprobe.py Refactored to use FileSerializer for output file handling; removed manual file I/O and path management; replaced lifecycle hooks with on_file_loaded to process content.
secator/tasks/feroxbuster.py Added response_headers to output_map for the Url type.
secator/tasks/httpx.py Renamed and extended on_init to on_cmd; added command string and state handling.
secator/tasks/bup.py Removed "duration": "unknown" from progress JSON output.
tests/integration/outputs.py Removed _uuid and _type fields from test data objects.

Sequence Diagram(s)

sequenceDiagram
    participant Runner
    participant FileSerializer
    participant Task

    Runner->>FileSerializer: on_cmd_start(runner)
    FileSerializer->>Runner: (modifies command with output flag/path if needed)
    Runner->>Task: run command
    Runner->>FileSerializer: on_cmd_done(runner)
    FileSerializer->>Runner: yield loaded file content
    Runner->>Task: on_file_loaded(content)
    Task->>Runner: yield parsed outputs (Vulnerabilities, Tags, etc.)

Poem

A bunny hops with files anew,
No more paths to chase or view.
Serializers lend a paw,
To parse the bytes without a flaw.
Progress ticks a bit more quick,
And headers show if that’s your pick!
πŸ‡βœ¨

✨ Finishing touches
πŸ§ͺ Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch add-file-serializer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar May 25 '25 17:05 coderabbitai[bot]