ocean icon indicating copy to clipboard operation
ocean copied to clipboard

[Integration][Harbor] initial integration

Open demigod-11 opened this issue 1 month ago • 2 comments

User description

Description

What - New Harbor integration for Port Ocean that syncs Harbor container registry resources (projects, users, repositories, and artifacts) into Port's catalog, enabling platform and security teams to visualize container images, projects, users, and relationships across their software supply chain.

Why - Harbor is a popular open-source container registry solution, and teams need visibility into their container images, vulnerabilities, and registry metadata within their Port catalog. This integration provides real-time synchronization of Harbor resources with webhook support for instant updates.

How - Implemented a complete Harbor integration using Ocean's async HTTP client with:

  • Resource Exporters: Async exporters for projects, users, repositories, and artifacts with pagination support
  • Authentication: Support for both robot account authentication (with token expiration handling) and basic authentication
  • Webhook Support: Real-time event processing for PUSH_ARTIFACT, DELETE_ARTIFACT, PULL_ARTIFACT, SCANNING_COMPLETED, and SCANNING_FAILED events
  • Webhook Client: Automated webhook management with upsert method for creating/updating webhook policies per Harbor project
  • Filtering: Comprehensive selector support for filtering resources by project name, visibility, tags, labels, severity thresholds, and more
  • Error Handling: Graceful handling of token expiration, API errors, and rate limiting
  • Memory Optimization: Streaming pattern for high-volume artifacts to prevent out-of-memory errors

Type of change

Please leave one option from the following and delete the rest:

  • [X] New Integration (non-breaking change which adds a new integration)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • [X] Integration able to create all default resources from scratch
  • [X] Resync finishes successfully
  • [X] Resync able to create entities
  • [X] Resync able to update entities
  • [X] Resync able to detect and delete entities
  • [ ] Scheduled resync able to abort existing resync and start a new one
  • [ ] Tested with at least 2 integrations from scratch
  • [ ] Tested with Kafka and Polling event listeners
  • [X] Tested deletion of entities that don't pass the selector

Integration testing checklist

  • [X] Integration able to create all default resources from scratch
  • [X] Completed a full resync from a freshly installed integration and it completed successfully
  • [X] Resync able to create entities
  • [X] Resync able to update entities
  • [X] Resync able to detect and delete entities
  • [X] Resync finishes successfully
  • [ ] If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • [ ] If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • [X] If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • [ ] Docs PR link here

Preflight checklist

  • [ ] Handled rate limiting
  • [X] Handled pagination
  • [X] Implemented the code in async
  • [ ] Support Multi account

Screenshots

Screenshot 2025-10-16 at 10 53 49 Screenshot 2025-10-16 at 10 54 11 Screenshot 2025-10-16 at 10 54 21 Screenshot 2025-10-16 at 10 54 40 Screenshot 2025-10-16 at 10 54 47 Screenshot 2025-10-16 at 10 54 53 Screenshot 2025-10-16 at 10 55 52

API Documentation

  • Main Reference: https://goharbor.io/
  • API v2.0 Endpoints: http://localhost:8081/api/v2.0/ (for local Harbor instance)

PR Type

Enhancement


Description

  • Complete Harbor container registry integration for Port Ocean with support for projects, users, repositories, and artifacts

  • Async HTTP client with pagination, authentication (robot account and basic auth), and comprehensive error handling

  • Real-time webhook support for artifact and repository events (PUSH, DELETE, SCANNING events) with HMAC-based authentication

  • Resource exporters with filtering capabilities for projects, users, repositories, and artifacts with project/repository enrichment

  • Webhook policy management client for automated webhook creation and updates per Harbor project

  • Comprehensive test suite covering client requests, pagination, authentication, and webhook event processing

  • Complete documentation, configuration schemas, and Port blueprints for Harbor resource visualization

  • Environment templates, contribution guidelines, and debug entry point for local development


Diagram Walkthrough

flowchart LR
  HarborAPI["Harbor API v2.0"]
  Auth["Authentication<br/>Robot/Basic"]
  Client["Harbor HTTP Client<br/>Pagination & Error Handling"]
  Exporters["Resource Exporters<br/>Projects/Users/Repos/Artifacts"]
  Webhooks["Webhook Management<br/>Event Processing"]
  PortOcean["Port Ocean<br/>Framework"]
  PortCatalog["Port Catalog<br/>Blueprints & Entities"]
  
  HarborAPI -- "API Requests" --> Client
  Auth -- "Credentials" --> Client
  Client -- "Fetch Resources" --> Exporters
  Client -- "Manage Webhooks" --> Webhooks
  Exporters -- "Resync Data" --> PortOcean
  Webhooks -- "Real-time Events" --> PortOcean
  PortOcean -- "Create/Update Entities" --> PortCatalog

File Walkthrough

Relevant files
Tests
7 files
test_harbor_client.py
Harbor client HTTP request and pagination tests                   

integrations/harbor/tests/harbor/clients/test_harbor_client.py

  • Comprehensive test suite for HarborClient covering client creation,
    initialization, and request handling
  • Tests for successful API requests, HTTP error handling, and ignored
    error scenarios
  • Pagination tests validating single/multiple page handling and 404
    error behavior
  • Tests for request parameters, JSON data payloads, and query string
    construction
+300/-0 
test_artifact_webhook_processor.py
Artifact webhook processor event handling tests                   

integrations/harbor/tests/harbor/webhook/webhook_processors/test_artifact_webhook_processor.py

  • Tests for artifact webhook processor validating payload structure and
    event types
  • Tests for handling PUSH_ARTIFACT and DELETE_ARTIFACT webhook events
  • Verification of artifact data fetching and deletion result handling
+124/-0 
test_repository_webhook_processor.py
Repository webhook processor event handling tests               

integrations/harbor/tests/harbor/webhook/webhook_processors/test_repository_webhook_processor.py

  • Tests for repository webhook processor validating payload structure
    and event types
  • Tests for handling PUSH_ARTIFACT and DELETE_ARTIFACT webhook events
    for repositories
  • Verification of repository data fetching and deletion result handling
+124/-0 
test_harbor_abstract_webhook_processor.py
Harbor abstract webhook processor authentication tests     

integrations/harbor/tests/harbor/webhook/test_harbor_abstract_webhook_processor.py

  • Tests for abstract webhook processor authentication using
    Authorization header and HMAC comparison
  • Tests for valid/invalid secret validation and missing header scenarios
  • Verification of should_process_event method always returning True
+104/-0 
conftest.py
Harbor integration test fixtures and configuration             

integrations/harbor/tests/conftest.py

  • Pytest fixtures for Harbor client testing with mock Ocean context
    initialization
  • Test configuration with Harbor host, credentials, and robot account
    settings
  • Fixtures for basic and robot authenticators, HTTP responses, and event
    contexts
  • Mock Ocean app configuration for integration testing
+96/-0   
test_basic_authenticator.py
Harbor basic and robot authenticator tests                             

integrations/harbor/tests/harbor/clients/auth/test_basic_authenticator.py

  • Tests for basic authentication token generation as base64-encoded
    username:password
  • Tests for header generation with Basic auth scheme and Accept header
  • Token and header caching verification
  • HTTP client property validation
+98/-0   
test_registry.py
Harbor webhook registry registration tests                             

integrations/harbor/tests/harbor/webhook/test_registry.py

  • Tests for Harbor webhook registry registration
  • Verification that both artifact and repository processors are
    registered
+17/-0   
Enhancement
20 files
utils.py
Harbor utility functions for query building and enrichment

integrations/harbor/harbor/helpers/utils.py

  • Utility functions for Harbor query string building with support for
    exact match, fuzzy match, range, and list patterns
  • Helper functions to build query parameters for projects, users,
    repositories, and artifacts with filtering options
  • Enrichment functions to add project and repository context to response
    objects
  • ObjectKind enum and IgnoredError named tuple for resource type and
    error handling
+212/-0 
main.py
Harbor integration main resync and webhook handlers           

integrations/harbor/main.py

  • Resync handlers for all Harbor resource kinds (projects, users,
    repositories, artifacts) using exporters
  • Artifact resync implementation that fetches repositories first, then
    streams artifacts from each repository
  • Webhook initialization on startup that creates/updates Harbor webhook
    policies for all projects
  • Integration with Ocean framework for resource synchronization and
    event handling
+168/-0 
harbor_client.py
Harbor async HTTP client with pagination and error handling

integrations/harbor/harbor/clients/http/harbor_client.py

  • Harbor API v2.0 client with async HTTP request handling and
    authentication support
  • Pagination support with Link header parsing for multi-page API
    responses
  • Error handling with ignored error patterns and token expiration
    detection
  • CSRF cookie clearing for write operations (POST, PUT, PATCH) to
    prevent validation errors
+153/-0 
webhook_client.py
Harbor webhook policy management client                                   

integrations/harbor/harbor/webhook/webhook_client.py

  • Client for managing Harbor webhook policies with upsert functionality
  • Webhook payload building with event types and authentication headers
  • Methods to create, update, and retrieve existing webhook policies per
    project
  • Support for webhook secret configuration and HTTPS certificate
    verification options
+135/-0 
artifact_webhook_processor.py
Artifact webhook event processor implementation                   

integrations/harbor/harbor/webhook/webhook_processors/artifact_webhook_processor.py

  • Webhook processor for Harbor artifact events (PUSH, PULL, DELETE,
    SCANNING events)
  • Handles artifact deletion by returning artifact metadata with
    project/repository context
  • Fetches latest artifact data for upsert events using the artifact
    exporter
  • Validates event payload structure and event type
+103/-0 
repository_webhook_processor.py
Repository webhook event processor implementation               

integrations/harbor/harbor/webhook/webhook_processors/repository_webhook_processor.py

  • Webhook processor for Harbor repository events (PUSH, PULL, DELETE
    artifact events)
  • Handles repository deletion by returning repository metadata with
    project context
  • Fetches latest repository data for upsert events using the repository
    exporter
  • Validates event payload structure and event type
+96/-0   
repository_exporter.py
Harbor repository exporter with project enrichment             

integrations/harbor/harbor/core/exporters/repository_exporter.py

  • Exporter for Harbor repositories with pagination and filtering support
  • Fetches single repository by project and repository name
  • Enriches repositories with project name mapping from project IDs
  • Caches paginated results for performance optimization
+64/-0   
artifact_exporter.py
Harbor artifact exporter with context enrichment                 

integrations/harbor/harbor/core/exporters/artifact_exporter.py

  • Exporter for Harbor artifacts with pagination and filtering support
  • Fetches single artifact by project, repository, and reference
    (tag/digest)
  • Enriches artifacts with project and repository context information
  • Handles repository name parsing for correct API endpoint construction
+61/-0   
options.py
Harbor resource filtering and option type definitions       

integrations/harbor/harbor/core/options.py

  • TypedDict definitions for single and list resource options (projects,
    users, repositories, artifacts)
  • Filtering options including query strings, sorting, and
    artifact-specific parameters
  • Support for artifact metadata options (tags, labels, scan overview,
    SBOM, signatures)
+67/-0   
initialize_client.py
Harbor client factory and initialization                                 

integrations/harbor/initialize_client.py

  • Factory pattern implementation for Harbor client singleton creation
  • Client initialization from Ocean integration configuration
  • Support for both robot account and basic authentication methods
  • Error handling for missing configuration
+52/-0   
harbor_abstract_webhook_processor.py
Harbor abstract webhook processor with authentication       

integrations/harbor/harbor/webhook/harbor_abstract_webhook_processor.py

  • Abstract base class for Harbor webhook processors with authentication
    support
  • HMAC-based authentication using Authorization header and webhook
    secret
  • Graceful handling of missing secrets with warning logs
  • should_process_event method always returning True
+47/-0   
abstract_authenticator.py
Harbor abstract authenticator with retry configuration     

integrations/harbor/harbor/clients/auth/abstract_authenticator.py

  • Abstract base class for Harbor authentication methods with token and
    header generation
  • HarborToken and HarborHeaders Pydantic models for type safety
  • HTTP client property with retry configuration and timeout settings
  • Support for Retry-After and X-RateLimit-Reset headers
+52/-0   
auth_factory.py
Harbor authenticator factory with credential validation   

integrations/harbor/harbor/clients/auth/auth_factory.py

  • Factory for creating Harbor authenticators based on configuration
    priority
  • Prefers robot account authentication over basic authentication
  • Validation of required configuration and credentials
  • Detailed error messages for missing credentials
+40/-0   
project_exporter.py
Harbor project exporter with pagination                                   

integrations/harbor/harbor/core/exporters/project_exporter.py

  • Exporter for Harbor projects with pagination and filtering support
  • Fetches single project by name
  • Caches paginated results for performance optimization
+37/-0   
user_exporter.py
Harbor user exporter with filtering                                           

integrations/harbor/harbor/core/exporters/user_exporter.py

  • Exporter for Harbor users with pagination and filtering support
  • Fetches single user by ID
  • Supports query and sort filtering options
+35/-0   
basic_authenticator.py
Harbor basic authentication implementation                             

integrations/harbor/harbor/clients/auth/basic_authenticator.py

  • Basic authentication implementation using base64-encoded
    username:password
  • Token caching for performance optimization
  • Header generation with Basic auth scheme
+37/-0   
abstract_exporter.py
Harbor abstract exporter base class                                           

integrations/harbor/harbor/core/exporters/abstract_exporter.py

  • Abstract base class for Harbor resource exporters with generic typing
  • Defines interface for single resource and paginated resource fetching
+20/-0   
utils.py
Harbor client configuration utility                                           

integrations/harbor/harbor/clients/utils.py

  • Utility function to build integration configuration dictionary from
    authenticator and Ocean config
+14/-0   
registry.py
Harbor webhook processor registration                                       

integrations/harbor/harbor/webhook/registry.py

  • Registry function to register all Harbor webhook processors with Ocean
    framework
  • Registers artifact and repository webhook processors at specified path
+15/-0   
robot_authenticator.py
Harbor robot account authenticator                                             

integrations/harbor/harbor/clients/auth/robot_authenticator.py

  • Robot account authentication extending basic authenticator
  • Uses robot name and token instead of username and password
+8/-0     
Configuration changes
11 files
integration.py
Harbor integration configuration and resource selectors   

integrations/harbor/integration.py

  • Configuration classes for Harbor resource kinds (projects, users,
    repositories, artifacts) with selectors
  • Selector classes defining filtering options for each resource type
    (query strings, sorting, artifact-specific options)
  • Main HarborPortAppConfig and HarborIntegration classes for Ocean
    framework integration
  • JQ entity processor configuration for data transformation
+136/-0 
events.py
Harbor webhook event type constants and definitions           

integrations/harbor/harbor/webhook/events.py

  • Event type constants for artifact, repository, and project webhook
    events
  • Separation of upsert and delete events for each resource kind
  • Comprehensive list of all Harbor webhook events and events for webhook
    creation
+61/-0   
launch.json
VSCode debug configuration updates and formatting               

.vscode/launch.json

  • Reformatted JSON indentation for consistency
  • Removed jira integration debug configuration
  • Added azure-resource-graph and http-server debug configurations
+339/-339
spec.yaml
Harbor integration specification and configuration schema

integrations/harbor/.port/spec.yaml

  • Harbor integration specification with exporter features for projects,
    repositories, users, and artifacts
  • Configuration schema for Harbor host URL, authentication credentials
    (username/password or robot account)
  • Webhook secret configuration for incoming webhook authentication
+36/-0   
blueprints.json
Harbor resource blueprints for Port catalog integration   

integrations/harbor/.port/resources/blueprints.json

  • Defines four Port blueprints for Harbor resources: harborProjects,
    harborUsers, harborRepositories, and harborArtifacts
  • Each blueprint includes schema properties with appropriate types
    (string, number, boolean, date-time, array)
  • Establishes relations between blueprints: repositories link to
    projects, artifacts link to repositories
  • Supports filtering and visualization of Harbor container registry
    metadata in Port's catalog
+179/-0 
pyproject.toml
Harbor integration project configuration and dependencies

integrations/harbor/pyproject.toml

  • Defines project metadata for Harbor integration with version
    0.1.0-beta and Python ^3.12 requirement
  • Specifies port_ocean dependency with CLI extras for core integration
    functionality
  • Includes development dependencies for testing (pytest, pytest-asyncio,
    pytest-httpx), code quality (black, mypy, pylint, ruff), and changelog
    management (towncrier)
  • Configures tool settings for mypy strict type checking, ruff linting,
    black formatting, and pytest async testing
+113/-0 
port-app-config.yml
Port application configuration for Harbor resource mapping

integrations/harbor/.port/resources/port-app-config.yml

  • Defines resource mappings for four Harbor entity kinds: projects,
    users, repositories, and artifacts
  • Maps Harbor API responses to Port blueprint properties using jq
    expressions for data transformation
  • Establishes entity relations: repositories link to projects, artifacts
    link to repositories
  • Configures identifier generation, title mapping, and property
    extraction for each resource type
+75/-0   
.env.example
Harbor integration environment configuration template       

integrations/harbor/.env.example

  • Provides environment variable template for Harbor integration
    configuration
  • Includes Port API credentials (CLIENT_ID, CLIENT_SECRET), Harbor
    connection details (HARBOR_HOST, USERNAME, PASSWORD)
  • Specifies webhook configuration (WEBHOOK_SECRET) and event listener
    type (POLLING)
  • Documents required configuration keys for integration initialization
    and resource synchronization
+11/-0   
Makefile
Harbor integration Makefile symlink                                           

integrations/harbor/Makefile

  • Creates symbolic link to shared Makefile from port-ocean
    infrastructure directory
  • Enables consistent build and development tooling across integrations
+1/-0     
sonar-project.properties
SonarQube configuration for Harbor integration                     

integrations/harbor/sonar-project.properties

  • Configures SonarQube project settings for code quality analysis
  • Sets project key to port-labs_ocean_harbor and organization to
    port-labs
+2/-0     
poetry.toml
Poetry virtual environment configuration                                 

integrations/harbor/poetry.toml

  • Configures Poetry virtual environment settings for Harbor integration
  • Enables automatic virtual environment creation within the project
    directory
+3/-0     
Miscellaneous
10 files
__init__.py
Harbor exporters package initialization                                   

integrations/harbor/harbor/core/exporters/init.py

  • Package initialization exporting all Harbor exporter classes
+13/-0   
__init__.py
Harbor authentication package initialization                         

integrations/harbor/harbor/clients/auth/init.py

  • Package initialization exporting authenticator classes and models
+15/-0   
debug.py
Harbor integration debug entry point                                         

integrations/harbor/debug.py

  • Debug entry point for running Harbor integration locally
+4/-0     
__init__.py
Harbor webhook processors package initialization                 

integrations/harbor/harbor/webhook/webhook_processors/init.py

  • Package initialization for webhook processors module
+1/-0     
__init__.py
Harbor webhook package initialization                                       

integrations/harbor/harbor/webhook/init.py

  • Package initialization for webhook module
+1/-0     
__init__.py
Webhook processor tests package initialization                     

integrations/harbor/tests/harbor/webhook/webhook_processors/init.py

  • Package initialization for webhook processor tests
+1/-0     
__init__.py
Harbor authentication tests package initialization             

integrations/harbor/tests/harbor/clients/auth/init.py

  • Package initialization for authentication tests
+1/-0     
__init__.py
Harbor client tests package initialization                             

integrations/harbor/tests/harbor/clients/init.py

  • Package initialization for client tests
+1/-0     
__init__.py
Harbor tests package initialization                                           

integrations/harbor/tests/harbor/init.py

  • Package initialization for Harbor tests
+1/-0     
__init__.py
Tests module package initialization                                           

integrations/harbor/tests/init.py

  • Package initialization for tests module
+1/-0     
Error handling
1 files
exceptions.py
Harbor integration custom exceptions                                         

integrations/harbor/harbor/helpers/exceptions.py

  • Custom exception classes for Harbor integration error handling
  • AuthenticationException, MissingConfiguration, MissingCredentials,
    InvalidTokenException
+17/-0   
Documentation
3 files
README.md
Harbor integration documentation and setup guide                 

integrations/harbor/README.md

  • Comprehensive documentation for Harbor integration including features,
    supported resources, and quick start guide
  • Covers authentication setup for both robot accounts and local users
    with token expiration handling
  • Provides configuration examples, webhook setup instructions for
    development and production environments
  • Documents resource filtering capabilities and supported webhook events
    (PUSH_ARTIFACT, DELETE_ARTIFACT, SCANNING_COMPLETED, SCANNING_FAILED)
+136/-0 
CHANGELOG.md
Initial Harbor integration changelog and release notes     

integrations/harbor/CHANGELOG.md

  • Documents initial release 0.1.0 dated 2024-10-16 with complete Harbor
    integration implementation
  • Lists added features including robot account authentication, webhook
    support, filtering, and async HTTP client
  • Details technical implementation aspects: singleton pattern,
    async/await patterns, type safety, and comprehensive test coverage
  • Highlights production-ready error handling, logging, and webhook
    authentication with signature validation
+43/-0   
CONTRIBUTING.md
Harbor integration contribution guidelines                             

integrations/harbor/CONTRIBUTING.md

  • Provides basic contribution guidelines for Harbor integration
    development
  • Includes placeholder section for local setup instructions with note
    about Harbor-specific gotchas
  • References rate limiting, credential setup, and other local
    development considerations
+7/-0     
Additional files
5 files
__init__.py [link]   
__init__.py [link]   
__init__.py [link]   
__init__.py [link]   
__init__.py [link]   

demigod-11 avatar Nov 04 '25 09:11 demigod-11

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Weak webhook auth

Description: Webhook authentication relies on comparing a static secret from the 'Authorization' header
without signature or timestamp, making it susceptible to replay if the endpoint is
reachable and HTTPS termination is misconfigured.
harbor_abstract_webhook_processor.py [20-44]

Referred Code
secret = ocean.integration_config.get("webhook_secret")

if not secret:
    logger.warning(
        "Skipping webhook signature verification because no secret is configured."
    )
    return True

received_token = headers.get("authorization") or headers.get("Authorization")
if not received_token:
    logger.error(
        "Missing 'Authorization' header. Harbor webhook authentication failed."
    )
    return False

logger.debug(
    "Validating Harbor webhook token...",
    extra={
        "received_token_prefix": received_token[:10] + "...",
        "expected_token_prefix": secret[:10] + "...",
    },


 ... (clipped 4 lines)
Transport assurance

Description: Clearing the session cookie before write requests may mask CSRF protections but does not
enforce TLS or host verification; ensure the async client enforces HTTPS and certificate
validation in deployment to avoid MITM exposure.
harbor_client.py [76-89]

Referred Code
# Harbor's API issues a session ID cookie (`sid`) during GET requests,
# which is meant for UI sessions (the Harbor web interface) and enforces CSRF checks
# on subsequent modifying requests (POST, PUT, PATCH).
#
# When we authenticate using Basic Auth programmatically (like in our Ocean client),
# these cookies are unnecessary and can cause CSRF validation errors:
#     {"code": "FORBIDDEN", "message": "CSRF token not found in request"}
#
# To avoid this, we clear cookies before any write operation so the request behaves
# like a stateless API call.
if method in ["POST", "PUT", "PATCH"]:
    self.client.cookies.clear()

try:
Ticket Compliance
🎫 No ticket provided
  • [ ] Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Insufficient Logging: New code performs critical actions (API requests, webhook upserts) with mostly info/debug
logs and lacks structured audit logs including user/context for actions like webhook
policy changes.

Referred Code
async def make_request(
    self,
    resource: str,
    params: Optional[Dict[str, Any]] = None,
    method: str = "GET",
    json_data: Optional[Dict[str, Any]] = None,
    ignored_errors: Optional[List[Any]] = None,
) -> httpx.Response:
    """Make a request to the Harbor API with authentication and error handling."""
    url = urljoin(self.api_url + "/", resource.lstrip("/"))

    headers = await self._authenticator.get_headers()
    headers_dict = headers.as_dict()

    logger.debug(f"Harbor API {method} {url} with params: {params}")

    # Harbor's API issues a session ID cookie (`sid`) during GET requests,
    # which is meant for UI sessions (the Harbor web interface) and enforces CSRF checks
    # on subsequent modifying requests (POST, PUT, PATCH).
    #
    # When we authenticate using Basic Auth programmatically (like in our Ocean client),


 ... (clipped 33 lines)
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Broad Exception: The upsert_webhook method catches generic Exception and only logs a message without
re-raising or returning status, which may hide failures and hinder retry/handling.

Referred Code

    if bool(self.webhook_secret) ^ bool(existing_webhook_secret):
        await self._update_webhook_policy(
            project_name, existing_webhook_id, webhook_url, webhook_events
        )
        return

    logger.info("Webhook already exists with appropriate configuration")

except HTTPStatusError as http_err:
    logger.error(
        f"HTTP error occurred while creating webhook for project {project_name}: {http_err}"
    )

except Exception as e:
    logger.error(f"Failed to upsert webhook for project {project_name}: {e}")
Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Sensitive Details: Error logs include full endpoint URLs and raw response texts which could leak internal
details in logs if exposed to users.

Referred Code
    logger.error(
        f"Harbor API error for endpoint '{url}': Status {e.response.status_code}, "
        f"Method: {method}, Response: {e.response.text}"
    )
    raise

except httpx.HTTPError as e:
    logger.error(f"HTTP error for endpoint '{url}': {str(e)}")
    raise
Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Secret Exposure Risk: The authenticate method logs prefixes of the received and expected webhook tokens which
may inadvertently expose parts of secrets in logs.

Referred Code

logger.debug(
    "Validating Harbor webhook token...",
    extra={
        "received_token_prefix": received_token[:10] + "...",
        "expected_token_prefix": secret[:10] + "...",
    },
)

return hmac.compare_digest(received_token, secret)
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Payload Validation: Webhook processors validate only the event type and then access nested fields without
schema validation which could lead to KeyErrors or unsafe assumptions on external input.

Referred Code
    """Validate the payload structure and content."""
    event_type = payload.get("type")
    if not event_type:
        return False

    valid_events = ARTIFACT_UPSERT_EVENTS + ARTIFACT_DELETE_EVENTS
    return event_type in valid_events

async def get_matching_kinds(self, event: WebhookEvent) -> list[str]:
    return [ObjectKind.ARTIFACTS]

async def handle_event(
    self, payload: EventPayload, resource_config: ResourceConfig
) -> WebhookEventRawResults:
    event_type = payload["type"]
    event_data = payload.get("event_data", {})
    resources = event_data.get("resources", [])
    repository = event_data.get("repository", {})

    if not resources or not repository:
        logger.warning(f"No resources or repository data in {event_type} event")


 ... (clipped 25 lines)
  • [ ] Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review[bot] avatar Nov 04 '25 09:11 qodo-code-review[bot]

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Refactor artifact resync to be project-based

Refactor the artifact resync process to be more scalable. Instead of fetching
all repositories at once, iterate through projects, then fetch repositories and
artifacts for each project individually.

Examples:

integrations/harbor/main.py [88-128]
@ocean.on_resync(ObjectKind.ARTIFACTS)
async def resync_artifacts(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
    """Resync all Harbor artifacts by fetching all repositories first."""
    logger.info(f"Starting resync for kind: {kind}")

    client = init_client()
    repository_exporter = HarborRepositoryExporter(client)
    artifact_exporter = HarborArtifactExporter(client)
    config = cast(HarborArtifactsConfig, event.resource_config)


 ... (clipped 31 lines)
integrations/harbor/harbor/core/exporters/repository_exporter.py [30-64]
    @cache_iterator_result()
    async def get_paginated_resources(
        self, options: ListRepositoryOptions
    ) -> ASYNC_GENERATOR_RESYNC_TYPE:
        """Get all Harbor repositories with pagination and filtering."""
        logger.info("Starting Harbor repositories export")

        params = build_repository_params(options)

        if not hasattr(self, "_projects_map"):

 ... (clipped 25 lines)

Solution Walkthrough:

Before:

async def resync_artifacts(kind: str):
    # 1. Fetch ALL repositories across all projects
    repository_exporter = HarborRepositoryExporter(client)
    # This internally fetches all projects first to map project IDs to names
    async for repositories in repository_exporter.get_paginated_resources(...):
        tasks = []
        # 2. For each repository, create a task to fetch its artifacts
        for repository in repositories:
            tasks.append(
                artifact_exporter.get_paginated_resources(
                    project_name=repository.get("project_name"),
                    repository_name=repository["name"],
                    ...
                )
            )
        # 3. Stream artifacts from all tasks concurrently
        async for artifacts in stream_async_iterators_tasks(*tasks):
            yield artifacts

After:

async def resync_artifacts(kind: str):
    project_exporter = HarborProjectExporter(client)
    repository_exporter = HarborRepositoryExporter(client)
    artifact_exporter = HarborArtifactExporter(client)

    # 1. Iterate through projects
    async for projects in project_exporter.get_paginated_resources(...):
        for project in projects:
            # 2. For each project, fetch its repositories
            async for repositories in repository_exporter.get_project_repositories(project["name"]):
                tasks = []
                # 3. For each repository, create a task to fetch its artifacts
                for repository in repositories:
                    tasks.append(
                        artifact_exporter.get_paginated_resources(...)
                    )
                # 4. Stream artifacts for the current set of repositories
                async for artifacts in stream_async_iterators_tasks(*tasks):
                    yield artifacts

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a significant performance bottleneck in the resync_artifacts logic, which could cause scalability issues, and proposes a valid, more efficient project-based approach.

Medium
Possible issue
Use a relative path in Makefile
Suggestion Impact:The commit updated the Makefile to include a relative path to the shared _infra Makefile, addressing the hardcoded absolute path issue.

code diff:

@@ -1 +1 @@
-
+include ../_infra/Makefile

Replace the absolute path in the Makefile with a relative path to ensure it
works across different development environments.

integrations/harbor/Makefile [1]

-/Users/wamanzi/Documents/port/port-ocean/port_ocean/cli/cookiecutter/../../../integrations/_infra/Makefile
+../../_infra/Makefile

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a hardcoded absolute path in the Makefile that would break the build for other developers and proposes a correct relative path to fix it.

Medium
Prevent potential index out-of-bounds error
Suggestion Impact:The commit added a check `webhook.get("targets")` before accessing `webhook["targets"][0]["address"]`, preventing out-of-bounds access when targets is missing or empty.

code diff:

-                    if webhook["targets"][0]["address"] == webhook_url
+                    if webhook.get("targets")
+                    and webhook["targets"][0]["address"] == webhook_url

Prevent a potential IndexError in _get_existing_webhooks by checking if the
targets list exists and is not empty before accessing its first element. This
makes the webhook lookup more resilient.

integrations/harbor/harbor/webhook/webhook_client.py [35-54]

 async def _get_existing_webhooks(
     self, project_name: str, webhook_url: str
 ) -> Dict[str, Any] | None:
     """Get all webhook policies for a project."""
     async for webhook_policies in self.client.send_paginated_request(
         f"/projects/{project_name}/webhook/policies",
     ):
         existing_webhook = next(
             (
                 webhook
                 for webhook in webhook_policies
-                if webhook["targets"][0]["address"] == webhook_url
+                if webhook.get("targets") and webhook["targets"][0]["address"] == webhook_url
             ),
             None,
         )
 
         if existing_webhook:
             return existing_webhook
 
     return None

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential IndexError if a webhook policy has an empty targets list. Adding a check improves the code's robustness against unexpected or malformed data from the Harbor API.

Medium
General
Use direct comparison for authentication token
Suggestion Impact:The commit removed hmac usage and replaced the comparison with a direct string equality check for the received token versus the secret, matching the suggestion.

code diff:

-import hmac
 from port_ocean.context.ocean import ocean
 from loguru import logger
 from port_ocean.core.handlers.webhook.webhook_event import (
@@ -32,15 +31,7 @@
             )
             return False
 
-        logger.debug(
-            "Validating Harbor webhook token...",
-            extra={
-                "received_token_prefix": received_token[:10] + "...",
-                "expected_token_prefix": secret[:10] + "...",
-            },
-        )
-
-        return hmac.compare_digest(received_token, secret)
+        return received_token == secret

Replace hmac.compare_digest with a direct string comparison (==) for webhook
authentication. This is more appropriate for comparing the static token in the
Authorization header and avoids misusing a cryptographic function.

integrations/harbor/harbor/webhook/harbor_abstract_webhook_processor.py [18-43]

 async def authenticate(self, payload: EventPayload, headers: EventHeaders) -> bool:
     """Authenticate the Harbor webhook request using Authorization header."""
     secret = ocean.integration_config.get("webhook_secret")
 
     if not secret:
         logger.warning(
             "Skipping webhook signature verification because no secret is configured."
         )
         return True
 
     received_token = headers.get("authorization") or headers.get("Authorization")
     if not received_token:
         logger.error(
             "Missing 'Authorization' header. Harbor webhook authentication failed."
         )
         return False
 
-    logger.debug(
-        "Validating Harbor webhook token...",
-        extra={
-            "received_token_prefix": received_token[:10] + "...",
-            "expected_token_prefix": secret[:10] + "...",
-        },
-    )
+    return received_token == secret
 
-    return hmac.compare_digest(received_token, secret)
-

[Suggestion processed]

Suggestion importance[1-10]: 4

__

Why: The suggestion correctly points out that hmac.compare_digest is not the ideal function for comparing plain text tokens. While the existing code is not functionally wrong, replacing it with a direct string comparison (==) is more idiomatic and avoids misusing a cryptographic function.

Low
Correct inconsistent webhook event documentation
Suggestion Impact:The committed diff modified the "Webhook Events" section by removing PULL_ARTIFACT but did not add SCANNING_FAILED; however, elsewhere earlier in the README the webhook setup includes SCANNING_FAILED. Thus, the commit partially implemented the suggestion by fixing the inconsistency regarding PULL_ARTIFACT.

code diff:

-## Webhook Events
-
-The integration supports these Harbor webhook events:
-
-- **PUSH_ARTIFACT**: Artifact pushed to registry
-- **PULL_ARTIFACT**: Artifact pulled from registry
-- **DELETE_ARTIFACT**: Artifact deleted from registry
-- **SCANNING_COMPLETED**: Vulnerability scan completed
-

Update the "Webhook Events" section in the README.md to be consistent with the
webhook setup instructions provided earlier in the file.

integrations/harbor/README.md [123-130]

 ## Webhook Events
 
 The integration supports these Harbor webhook events:
 
 - **PUSH_ARTIFACT**: Artifact pushed to registry
-- **PULL_ARTIFACT**: Artifact pulled from registry
 - **DELETE_ARTIFACT**: Artifact deleted from registry
 - **SCANNING_COMPLETED**: Vulnerability scan completed
+- **SCANNING_FAILED**: Vulnerability scan failed

[Suggestion processed]

Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies an inconsistency in the README.md file between the recommended webhook setup and the documented list of supported events, improving documentation clarity.

Low
Fix typo in example environment variable
Suggestion Impact:The commit corrected the placeholder from to exactly as suggested.

code diff:

-OCEAN__INTEGRATION__CONFIG__WEBHOOK_SECRET=<wehhook_secrets>
+OCEAN__INTEGRATION__CONFIG__WEBHOOK_SECRET=<webhook_secret>

Correct the typo in the placeholder value for
OCEAN__INTEGRATION__CONFIG__WEBHOOK_SECRET in the .env.example file.

integrations/harbor/.env.example [10]

-OCEAN__INTEGRATION__CONFIG__WEBHOOK_SECRET=<wehhook_secrets>
+OCEAN__INTEGRATION__CONFIG__WEBHOOK_SECRET=<webhook_secret>

[Suggestion processed]

Suggestion importance[1-10]: 2

__

Why: The suggestion correctly identifies a minor typo in a placeholder value within the .env.example file, which is a low-impact change that improves clarity.

Low
  • [ ] Update

qodo-code-review[bot] avatar Nov 04 '25 09:11 qodo-code-review[bot]