openvsx icon indicating copy to clipboard operation
openvsx copied to clipboard

Prepublish Check Framework & Admin Panel Design Approval (Milestone 1)

Open chrisguindon opened this issue 1 month ago • 3 comments

Parent issue: #1331

Objective

Establish the architecture, configuration, and workflow design for the Open VSX security verification framework and admin panel.

Deliverables

  • High-level architecture for the extensible pre-publish security framework
  • Documentation on how new verification checks are registered and configured
  • Admin panel mockups and proposed workflow for managing quarantined extensions
  • Implementation plan outlining integration strategy, dependencies, and testing approach

chrisguindon avatar Nov 07 '25 16:11 chrisguindon

Architecture & Workflow

Architecture diagrams and admin-panel mockups are now complete for the new Open VSX pre-publish verification framework. The system introduces two layers of verification in the publishing flow. First, a set of synchronous fast-fail checks runs during the publish request. These checks, name-similarity detection, secret scanning, and malicious file-hash blocking, operate immediately and do not persist file content. Any failure in this stage stops publication outright.

Once the VSIX is uploaded, an asynchronous deep-content scanning phase begins. ClamAV and YARA run as part of an extensible scanning pipeline, and a version is not activated until this step completes successfully. This pipeline writes results into a new database model that tracks scans, detected threats, validation failures, and subsequent admin decisions.

Extensions that fail asynchronous scanning move into a quarantine state, where reviewers can inspect findings and make allow/block decisions through the new admin interface.

All framework components will be developed in public repositories under the Eclipse process. Security-sensitive elements such as detection rules, pattern sets, and thresholds will remain private to avoid creating avenues for circumvention.

Implementation Plan

The first phase focuses on fast-fail checks because they block publication and require no long-running infrastructure. Once those are stable, the team will move into the asynchronous scanning pipeline, which introduces new schema, background workers, and admin review workflows. The final phase is to implement download flood control and bring the admin UI to completion and provide completed documentation.

Phase 1: Foundation & Fast-Fail Checks

Objectives

  • Establish verification infrastructure
  • Implement synchronous validation checks

Key Deliverables

  1. Verification Service Layer

    • Core verification framework
    • Integration with publish pipeline
    • Error handling and logging
  2. Name Similarity Detection

    • Algorithm for detecting impersonating names
    • Integration with search infrastructure
    • Configurable similarity thresholds
  3. Secret Scanning

    • Pattern matching engine
    • Entropy analysis for false positive reduction
    • File filtering and performance optimization
  4. File Hash Blocklist

    • Blocklist storage and management
    • Hash calculation and matching

Success Criteria

  • All fast-fail checks operational
  • Publish latency is not significantly impacted

Risks & Mitigation

  • Risk: Performance impact on publish flow
    • Mitigation: Benchmark each check, optimize paths
  • Risk: False positives blocking legitimate extensions
    • Mitigation: Extensive testing, configurable thresholds

Phase 2: Asynchronous Scanning Pipeline

Objectives

  • Build extensible scanning infrastructure
  • Integrate malware detection tools
  • Implement quarantine workflow, ensuring extensions remain inactive until cleared

Key Deliverables

  1. Database Schema & Models

    • Scan tracking entities
    • Finding and threat storage
    • Quarantine state management
    • Audit trail for admin decisions
  2. Scanning Pipeline Architecture

    • Pluggable scanner interface
    • Job scheduling and orchestration
    • Result aggregation and persistence
    • Error handling and retry logic
  3. Scanner Integrations

    • ClamAV integration for malware detection
    • YARA integration for pattern matching
    • Extensibility for future scanners
  4. Quarantine System

    • Automatic quarantine on threat detection
    • State management (pending → quarantined → reviewed)
    • Integration with activation workflow

Success Criteria

  • Scanning pipeline processes all uploads
  • Extensions remain inactive until scan completion
  • Quarantine workflow prevents unauthorized activation
  • Scanner integrations stable and performant

Risks & Mitigation

  • Risk: Scanner dependencies (ClamAV, YARA) availability
    • Mitigation: Fallback mechanisms, health checks, admin workflows
  • Risk: Scanning latency impacting user experience
    • Mitigation: Async processing, clear status communication
  • Risk: High false positive rate overwhelming admin team
    • Mitigation: Severity classification, automated filtering, threshold/rule tuning

Phase 3: Admin Interface & Production Hardening

Objectives

  • Finalize admin review interface
  • Implement download flood control
  • Complete documentation
  • Production readiness

Key Deliverables

  1. Admin Review Interface

    • Dashboard for scan management
    • Detailed threat and finding views
    • Allow/block decision workflow
    • Audit logging and history
  2. Download Flood Control

    • Rate limiting implementation
    • Abuse prevention mechanisms
    • Monitoring and alerting
  3. Documentation

    • Architecture documentation
    • Admin user guide
    • Developer integration guide
    • API documentation updates
  4. Production Hardening

    • Performance optimization
    • Monitoring and alerting
    • Disaster recovery procedures

Success Criteria

  • Admins can efficiently review and make decisions
  • Download abuse prevented
  • Documentation complete and reviewed
  • System meets production SLA requirements

Risks & Mitigation

  • Risk: Admin interface usability issues
    • Mitigation: User testing, iterative design, feedback loops
  • Risk: Download control impacting legitimate users
    • Mitigation: Careful threshold tuning, monitoring, quick adjustment capability

External Dependencies

  • ClamAV daemon availability in production environment
  • YARA binary availability in production environment

Library Dependencies


Diagrams

Name Similarity

These diagrams show how the platform detects near-duplicate or impersonating extension names across both Elasticsearch and database backends, and how synchronous validation blocks publication when a collision is found.

Image Image

Secret Scanning

This set outlines detection of hard-coded secrets within VSIX contents using entropy analysis and regex-based matching, along with the flow for blocking publication when sensitive material is found.

Image Image

Blocklist

These diagrams describe the file-hash blocklist service used to prevent known malicious files from re-entering the ecosystem. The check runs synchronously at publish time and blocks publication when hashes match.

Image Image

Malware Scanning

Here the asynchronous pipeline shows how scans are orchestrated, how threats are recorded, and how flagged extensions transition into quarantine for review.

Image

Admin UI Mockups

The mockups illustrate the full review workflow: a dashboard of in-progress and quarantined scans, detailed threat and validation-failure views, and explicit allow/block decision paths for reviewers.

Image Image Image Image Image

janbro avatar Nov 19 '25 05:11 janbro

Sharing some additional artifacts detailing the plans for a yml configurable implementation of the Scanner class. The class will allow definitions of scanners in the application.yml, which can reference environment variables for sensitive information. Alternatively, entire scanner configurations can be loaded using the spring.config.import through Kubernetes secrets.

The high-level structure of the yml would be defined as follows:

ovsx:
  scanning:
    enabled: true
    configured:
      <scanner-name>:
        # Basic settings
        enabled: true
        type: "SCANNER_TYPE"
        async: true|false
        timeout-minutes: 60
        
        # HTTP operations
        start:    # Required - initiate scan
        poll:     # Async only - check status
        result:   # Async only - get results

Each operation (start/poll/result) defines the http request and response, using JSONPath expressions to extract data:

method: POST|GET
url: "https://api.scanner.com/endpoint"
headers:
  X-API-Key: "${ENV_VAR}"
body:
  type: multipart|json
  file-field: "file"
response:
  format: json
  analysis-id-path: "$.data.id"      # Start: Extract job ID
  status-path: "$.status"            # Poll: Extract status
  complete-when: "completed"         # Poll: Completion value
  threats-path: "$.threats"          # Result: Extract threats
  threat-mapping:
    condition: "$.detected == true"   # Filter threats
    name-path: "$.virus_name"         # Threat name
    description-path: "$.virus_desc"  # Threat description
    severity-expression: "..."        # Compute severity
    file-path: "$.file_info.name"     # File name

The class diagram:

Image

[!NOTE] The specific structure of the HTTP operations are subject to change to support any future needs of specific scanning vendors.

janbro avatar Nov 25 '25 03:11 janbro

LGTM +1

chrisguindon avatar Nov 26 '25 18:11 chrisguindon