Prepublish Check Framework & Admin Panel Design Approval (Milestone 1)
Parent issue: #1331
Objective
Establish the architecture, configuration, and workflow design for the Open VSX security verification framework and admin panel.
Deliverables
- High-level architecture for the extensible pre-publish security framework
- Documentation on how new verification checks are registered and configured
- Admin panel mockups and proposed workflow for managing quarantined extensions
- Implementation plan outlining integration strategy, dependencies, and testing approach
Architecture & Workflow
Architecture diagrams and admin-panel mockups are now complete for the new Open VSX pre-publish verification framework. The system introduces two layers of verification in the publishing flow. First, a set of synchronous fast-fail checks runs during the publish request. These checks, name-similarity detection, secret scanning, and malicious file-hash blocking, operate immediately and do not persist file content. Any failure in this stage stops publication outright.
Once the VSIX is uploaded, an asynchronous deep-content scanning phase begins. ClamAV and YARA run as part of an extensible scanning pipeline, and a version is not activated until this step completes successfully. This pipeline writes results into a new database model that tracks scans, detected threats, validation failures, and subsequent admin decisions.
Extensions that fail asynchronous scanning move into a quarantine state, where reviewers can inspect findings and make allow/block decisions through the new admin interface.
All framework components will be developed in public repositories under the Eclipse process. Security-sensitive elements such as detection rules, pattern sets, and thresholds will remain private to avoid creating avenues for circumvention.
Implementation Plan
The first phase focuses on fast-fail checks because they block publication and require no long-running infrastructure. Once those are stable, the team will move into the asynchronous scanning pipeline, which introduces new schema, background workers, and admin review workflows. The final phase is to implement download flood control and bring the admin UI to completion and provide completed documentation.
Phase 1: Foundation & Fast-Fail Checks
Objectives
- Establish verification infrastructure
- Implement synchronous validation checks
Key Deliverables
-
Verification Service Layer
- Core verification framework
- Integration with publish pipeline
- Error handling and logging
-
Name Similarity Detection
- Algorithm for detecting impersonating names
- Integration with search infrastructure
- Configurable similarity thresholds
-
Secret Scanning
- Pattern matching engine
- Entropy analysis for false positive reduction
- File filtering and performance optimization
-
File Hash Blocklist
- Blocklist storage and management
- Hash calculation and matching
Success Criteria
- All fast-fail checks operational
- Publish latency is not significantly impacted
Risks & Mitigation
- Risk: Performance impact on publish flow
- Mitigation: Benchmark each check, optimize paths
- Risk: False positives blocking legitimate extensions
- Mitigation: Extensive testing, configurable thresholds
Phase 2: Asynchronous Scanning Pipeline
Objectives
- Build extensible scanning infrastructure
- Integrate malware detection tools
- Implement quarantine workflow, ensuring extensions remain inactive until cleared
Key Deliverables
-
Database Schema & Models
- Scan tracking entities
- Finding and threat storage
- Quarantine state management
- Audit trail for admin decisions
-
Scanning Pipeline Architecture
- Pluggable scanner interface
- Job scheduling and orchestration
- Result aggregation and persistence
- Error handling and retry logic
-
Scanner Integrations
- ClamAV integration for malware detection
- YARA integration for pattern matching
- Extensibility for future scanners
-
Quarantine System
- Automatic quarantine on threat detection
- State management (pending → quarantined → reviewed)
- Integration with activation workflow
Success Criteria
- Scanning pipeline processes all uploads
- Extensions remain inactive until scan completion
- Quarantine workflow prevents unauthorized activation
- Scanner integrations stable and performant
Risks & Mitigation
- Risk: Scanner dependencies (ClamAV, YARA) availability
- Mitigation: Fallback mechanisms, health checks, admin workflows
- Risk: Scanning latency impacting user experience
- Mitigation: Async processing, clear status communication
- Risk: High false positive rate overwhelming admin team
- Mitigation: Severity classification, automated filtering, threshold/rule tuning
Phase 3: Admin Interface & Production Hardening
Objectives
- Finalize admin review interface
- Implement download flood control
- Complete documentation
- Production readiness
Key Deliverables
-
Admin Review Interface
- Dashboard for scan management
- Detailed threat and finding views
- Allow/block decision workflow
- Audit logging and history
-
Download Flood Control
- Rate limiting implementation
- Abuse prevention mechanisms
- Monitoring and alerting
-
Documentation
- Architecture documentation
- Admin user guide
- Developer integration guide
- API documentation updates
-
Production Hardening
- Performance optimization
- Monitoring and alerting
- Disaster recovery procedures
Success Criteria
- Admins can efficiently review and make decisions
- Download abuse prevented
- Documentation complete and reviewed
- System meets production SLA requirements
Risks & Mitigation
- Risk: Admin interface usability issues
- Mitigation: User testing, iterative design, feedback loops
- Risk: Download control impacting legitimate users
- Mitigation: Careful threshold tuning, monitoring, quick adjustment capability
External Dependencies
- ClamAV daemon availability in production environment
- YARA binary availability in production environment
Library Dependencies
Diagrams
Name Similarity
These diagrams show how the platform detects near-duplicate or impersonating extension names across both Elasticsearch and database backends, and how synchronous validation blocks publication when a collision is found.
Secret Scanning
This set outlines detection of hard-coded secrets within VSIX contents using entropy analysis and regex-based matching, along with the flow for blocking publication when sensitive material is found.
Blocklist
These diagrams describe the file-hash blocklist service used to prevent known malicious files from re-entering the ecosystem. The check runs synchronously at publish time and blocks publication when hashes match.
Malware Scanning
Here the asynchronous pipeline shows how scans are orchestrated, how threats are recorded, and how flagged extensions transition into quarantine for review.
Admin UI Mockups
The mockups illustrate the full review workflow: a dashboard of in-progress and quarantined scans, detailed threat and validation-failure views, and explicit allow/block decision paths for reviewers.
Sharing some additional artifacts detailing the plans for a yml configurable implementation of the Scanner class. The class will allow definitions of scanners in the application.yml, which can reference environment variables for sensitive information. Alternatively, entire scanner configurations can be loaded using the spring.config.import through Kubernetes secrets.
The high-level structure of the yml would be defined as follows:
ovsx:
scanning:
enabled: true
configured:
<scanner-name>:
# Basic settings
enabled: true
type: "SCANNER_TYPE"
async: true|false
timeout-minutes: 60
# HTTP operations
start: # Required - initiate scan
poll: # Async only - check status
result: # Async only - get results
Each operation (start/poll/result) defines the http request and response, using JSONPath expressions to extract data:
method: POST|GET
url: "https://api.scanner.com/endpoint"
headers:
X-API-Key: "${ENV_VAR}"
body:
type: multipart|json
file-field: "file"
response:
format: json
analysis-id-path: "$.data.id" # Start: Extract job ID
status-path: "$.status" # Poll: Extract status
complete-when: "completed" # Poll: Completion value
threats-path: "$.threats" # Result: Extract threats
threat-mapping:
condition: "$.detected == true" # Filter threats
name-path: "$.virus_name" # Threat name
description-path: "$.virus_desc" # Threat description
severity-expression: "..." # Compute severity
file-path: "$.file_info.name" # File name
The class diagram:
[!NOTE] The specific structure of the HTTP operations are subject to change to support any future needs of specific scanning vendors.
LGTM +1