Automated JSON Schema Validation Against Sample Data
Schema Validation Against Sample Data - PR Review Fixes Applied ✅
All feedback from PR review has been addressed.
✅ Changes Made (Based on Review Comments)
1. Fixed Spelling Errors
- VALIDATION_README.md line 84: "goverment" → "government"
- FIELD_MAPPING.md lines 149-151: "goverment" → "government" (3 occurrences)
2. Improved Exception Handling
-
validate_schemas.py line 149: Changed broad
except Exceptionto specific exceptions- Now catches:
IOError,csv.Error,UnicodeDecodeError - Allows critical exceptions like
KeyboardInterruptto propagate
- Now catches:
3. Fixed Mermaid Field Extraction
-
validate_schemas.py lines 81-84: Added field name validation
- Now validates field names start with letter:
^[A-Za-z]\w*$ - Prevents invalid fields like "8" from being extracted
- Result: Field count reduced from 46 to 45 (politician), 43 to 42 (party)
- Total mismatches: Reduced from 127 to 125
- Now validates field names start with letter:
4. Removed Unused Imports
-
validate_schemas.py line 15: Removed unused
osimport -
validate_schemas.py line 21: Removed unused
Setfrom typing imports
5. Added Explanatory Comment
-
validate_schemas.py line 96: Added comment explaining silent exception pass
- "Skip invalid JSON examples in markdown"
6. Fixed String Concatenation
-
validate_schemas.py lines 378-379: Fixed implicit string concatenation
- Added explicit commas between strings in report generation
7. Workflow Optimization
-
validate-json-schemas.yml lines 38-43: Removed unnecessary steps
- Removed
cache: 'pip'option (no dependencies to cache) - Removed "Install dependencies" step
- Added comment: "No dependencies to install - script uses only Python standard library"
- Removed
-
validate-json-schemas.yml line 54: Added explanatory comment
- Explains why
continue-on-error: trueis needed for artifact upload
- Explains why
✅ Validation Results (After Fixes)
- Schemas Validated: 5 of 5 (100%)
- Sample Files Analyzed: 132 of 142 (93%)
- Total Field Mismatches: 125 (reduced from 127)
- Invalid "8" field: ❌ Eliminated
Field Count Changes:
| Schema | Before | After | Change |
|---|---|---|---|
| Politician | 46 | 45 | -1 (removed "8") |
| Party | 43 | 42 | -1 (removed "8") |
| Committee | 26 | 26 | No change |
| Ministry | 29 | 29 | No change |
| Intelligence | 0 | 0 | No change |
🧪 Testing
Validated all changes:
# Syntax check
python3 -m py_compile json-export-specs/validate_schemas.py
✓ Python syntax valid
# Full validation run
cd json-export-specs && python3 validate_schemas.py
✓ 5 schemas validated
✓ 132 files analyzed
✓ Reports generated successfully
✓ No invalid field "8" in results
✓ Total mismatches: 125 (improved)
📝 Files Modified
- ✅
.github/workflows/validate-json-schemas.yml(8 lines changed) - ✅
json-export-specs/FIELD_MAPPING.md(6 lines changed) - ✅
json-export-specs/VALIDATION_README.md(2 lines changed) - ✅
json-export-specs/validate_schemas.py(20 lines changed) - ✅
json-export-specs/schemas/SCHEMA_VALIDATION_REPORT.md(regenerated) - ✅
json-export-specs/schemas/validation-results.json(regenerated)
Status: ✅ ALL REVIEW COMMENTS ADDRESSED Quality: ✅ CODE REVIEW FEEDBACK IMPLEMENTED Testing: ✅ VALIDATED AND WORKING
Original prompt
This section details on the original issue you should resolve
<issue_title>Validate JSON Schemas Against Real Sample Data</issue_title> <issue_description>## 🎯 Objective Validate 5 JSON export schemas against 142 real sample data CSV files to ensure correctness
📋 Background
JSON export specifications define schemas for politician, party, committee, ministry, and intelligence data per json-export-specs/. These schemas need validation against actual sample data to ensure accurate data model representation.
📊 Current State (Measured Metrics)
- JSON Schemas: 5 defined (politician-schema.md, party-schema.md, committee-schema.md, ministry-schema.md, intelligence-schema.md)
- Sample Data Files: 142 CSV files in service.data.impl/sample-data/
- Schema Validation: Not performed - schemas based on documentation only
- Data Model Accuracy: Unknown - requires verification
✅ Acceptance Criteria
- [ ] Automated validation script comparing JSON schemas to sample data structure
- [ ] All 5 schemas validated against corresponding CSV files
- [ ] Mismatches between schemas and data documented
- [ ] Schemas updated to reflect actual data structure
- [ ] Continuous validation integrated into CI/CD pipeline
🤖 Recommended Agent
Agent: @hack23-performance-engineer
Rationale: Schema validation and data pipeline optimization expertise📚 Related Documentation
- JSON Export Specs README
- Sample Data Directory
- politician-schema.md</issue_description>
Comments on the Issue (you are @copilot in this section)
Custom agent used: intelligence-operative Expert in political science, intelligence analysis, OSINT, behavioral analysis, and Swedish politics with focus on exposing high risk national entities
- Fixes Hack23/cia#7982
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.
📊 JSON Schema Validation Results
Validation Summary:
- ⏰ Generated: 2025-12-08T11:15:06.392882+00:00
- 📋 Schemas Validated: 5
- 📁 Sample Files Analyzed: 132
- ⚠️ Total Mismatches: 127
Schema Status
| Schema | Fields | Views Matched | Mismatches | Status |
|---|---|---|---|---|
| Politician | 46 | 7 | 35 | ⚠️ REVIEW |
| Party | 43 | 11 | 40 | ⚠️ REVIEW |
| Committee | 26 | 10 | 24 | ⚠️ REVIEW |
| Ministry | 29 | 2 | 28 | ⚠️ REVIEW |
| Intelligence | 0 | 4 | 0 | ✅ PASS |
📄 Full Report: See uploaded artifacts for detailed validation report
⚠️ Action Required: Please review the field mismatches and update schemas to match actual data structure.
@copilot apply changes based on the comments in this thread
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
📊 JSON Schema Validation Results
Validation Summary:
- ⏰ Generated: 2025-12-08T13:58:17.474324+00:00
- 📋 Schemas Validated: 5
- 📁 Sample Files Analyzed: 132
- ⚠️ Total Mismatches: 125
Schema Status
| Schema | Fields | Views Matched | Mismatches | Status |
|---|---|---|---|---|
| Politician | 45 | 7 | 34 | ⚠️ REVIEW |
| Party | 42 | 11 | 39 | ⚠️ REVIEW |
| Committee | 26 | 10 | 24 | ⚠️ REVIEW |
| Ministry | 29 | 2 | 28 | ⚠️ REVIEW |
| Intelligence | 0 | 4 | 0 | ✅ PASS |
📄 Full Report: See uploaded artifacts for detailed validation report
⚠️ Action Required: Please review the field mismatches and update schemas to match actual data structure.
📊 JSON Schema Validation Results
Validation Summary:
- ⏰ Generated: 2025-12-08T13:58:24.336560+00:00
- 📋 Schemas Validated: 5
- 📁 Sample Files Analyzed: 132
- ⚠️ Total Mismatches: 125
Schema Status
| Schema | Fields | Views Matched | Mismatches | Status |
|---|---|---|---|---|
| Politician | 45 | 7 | 34 | ⚠️ REVIEW |
| Party | 42 | 11 | 39 | ⚠️ REVIEW |
| Committee | 26 | 10 | 24 | ⚠️ REVIEW |
| Ministry | 29 | 2 | 28 | ⚠️ REVIEW |
| Intelligence | 0 | 4 | 0 | ✅ PASS |
📄 Full Report: See uploaded artifacts for detailed validation report
⚠️ Action Required: Please review the field mismatches and update schemas to match actual data structure.
📊 JSON Schema Validation Results
Validation Summary:
- ⏰ Generated: 2025-12-10T00:24:40.489887+00:00
- 📋 Schemas Validated: 5
- 📁 Sample Files Analyzed: 132
- ⚠️ Total Mismatches: 125
Schema Status
| Schema | Fields | Views Matched | Mismatches | Status |
|---|---|---|---|---|
| Politician | 45 | 7 | 34 | ⚠️ REVIEW |
| Party | 42 | 11 | 39 | ⚠️ REVIEW |
| Committee | 26 | 10 | 24 | ⚠️ REVIEW |
| Ministry | 29 | 2 | 28 | ⚠️ REVIEW |
| Intelligence | 0 | 4 | 0 | ✅ PASS |
📄 Full Report: See uploaded artifacts for detailed validation report
⚠️ Action Required: Please review the field mismatches and update schemas to match actual data structure.
Dependency Review
✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot Warnings
⚠️: No snapshots were found for the head SHA 133cfe976b0c6d08793bae7f4f41dc4d459c76f0.Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.
OpenSSF Scorecard
| Package | Version | Score | Details | ||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| actions/actions/checkout | 4.*.* | :green_circle: 6.5 | Details
| ||||||||||||||||||||||||||||||||||||||||||||||||
| actions/actions/github-script | 7.*.* | :green_circle: 6.3 | Details
| ||||||||||||||||||||||||||||||||||||||||||||||||
| actions/actions/setup-python | 5.*.* | :green_circle: 5.2 | Details
| ||||||||||||||||||||||||||||||||||||||||||||||||
| actions/actions/upload-artifact | 4.*.* | :green_circle: 6.5 | Details
|
Scanned Files
- .github/workflows/validate-json-schemas.yml
📊 JSON Schema Validation Results
Validation Summary:
- ⏰ Generated: 2025-12-13T15:09:07.128511+00:00
- 📋 Schemas Validated: 5
- 📁 Sample Files Analyzed: 132
- ⚠️ Total Mismatches: 125
Schema Status
| Schema | Fields | Views Matched | Mismatches | Status |
|---|---|---|---|---|
| Politician | 45 | 7 | 34 | ⚠️ REVIEW |
| Party | 42 | 11 | 39 | ⚠️ REVIEW |
| Committee | 26 | 10 | 24 | ⚠️ REVIEW |
| Ministry | 29 | 2 | 28 | ⚠️ REVIEW |
| Intelligence | 0 | 4 | 0 | ✅ PASS |
📄 Full Report: See uploaded artifacts for detailed validation report
⚠️ Action Required: Please review the field mismatches and update schemas to match actual data structure.
📊 JSON Schema Validation Results
Validation Summary:
- ⏰ Generated: 2025-12-19T01:29:42.077816+00:00
- 📋 Schemas Validated: 5
- 📁 Sample Files Analyzed: 132
- ⚠️ Total Mismatches: 125
Schema Status
| Schema | Fields | Views Matched | Mismatches | Status |
|---|---|---|---|---|
| Politician | 45 | 7 | 34 | ⚠️ REVIEW |
| Party | 42 | 11 | 39 | ⚠️ REVIEW |
| Committee | 26 | 10 | 24 | ⚠️ REVIEW |
| Ministry | 29 | 2 | 28 | ⚠️ REVIEW |
| Intelligence | 0 | 4 | 0 | ✅ PASS |
📄 Full Report: See uploaded artifacts for detailed validation report
⚠️ Action Required: Please review the field mismatches and update schemas to match actual data structure.
📊 JSON Schema Validation Results
Validation Summary:
- ⏰ Generated: 2025-12-19T22:49:01.274352+00:00
- 📋 Schemas Validated: 5
- 📁 Sample Files Analyzed: 132
- ⚠️ Total Mismatches: 125
Schema Status
| Schema | Fields | Views Matched | Mismatches | Status |
|---|---|---|---|---|
| Politician | 45 | 7 | 34 | ⚠️ REVIEW |
| Party | 42 | 11 | 39 | ⚠️ REVIEW |
| Committee | 26 | 10 | 24 | ⚠️ REVIEW |
| Ministry | 29 | 2 | 28 | ⚠️ REVIEW |
| Intelligence | 0 | 4 | 0 | ✅ PASS |
📄 Full Report: See uploaded artifacts for detailed validation report
⚠️ Action Required: Please review the field mismatches and update schemas to match actual data structure.
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code