Pull Request

Summary

Fixes UUID validation errors in task management endpoints that caused silent failures during knowledge ingestion. Invalid task IDs like "12", "322", "61" were reaching PostgreSQL and causing invalid input syntax for type uuid errors. This PR adds robust UUID validation at API and service boundaries, returning clear HTTP 400 errors instead of silent failures.

Part 1 of 2 - This addresses the UUID validation issue. Part 2 (separate PR) will handle timeout error propagation.

Changes Made

Created reusable validation utility (python/src/server/utils/validation.py) with is_valid_uuid() using Python's uuid.UUID()
Added service layer validation in task_service.py to validate task IDs before database operations
Added API boundary validation to 4 task endpoints (GET, PUT, DELETE, MCP) to reject invalid UUIDs with HTTP 400
Added comprehensive test suite with 56 tests covering unit, integration, and service layer scenarios
Improved error messages - clear, user-friendly messages instead of PostgreSQL errors

Type of Change

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Documentation update
[x] Performance improvement (removed expensive traceback logging)
[x] Code refactoring

Affected Services

[ ] Frontend (React UI)
[x] Server (FastAPI backend)
[ ] MCP Server (Model Context Protocol)
[ ] Agents (PydanticAI service)
[ ] Database (migrations/schema)
[ ] Docker/Infrastructure
[ ] Documentation site

Testing

[x] All existing tests pass
[x] Added new tests for new functionality (56 new tests)
[x] Manually tested affected user flows
[x] Docker builds succeed for all services

Test Evidence

All 56 UUID validation tests passing:

# Run all UUID validation tests
docker compose run --rm archon-server pytest \
  tests/server/utils/test_validation.py \
  tests/server/api_routes/test_task_uuid_validation.py \
  tests/server/services/test_task_service_uuid_validation.py \
  -v

# Result: 56 passed in ~22 seconds

Test Coverage:

21 unit tests - Validation utility (valid UUIDs, invalid integers "12"/"322"/"61", edge cases)
17 integration tests - API endpoints (all 4 endpoints, error handling, consistency)
18 service layer tests - Service validation (database protection, performance, no regression)

Manual Testing:

# Invalid UUID returns HTTP 400 with clear error
curl -X GET http://localhost:8181/api/tasks/12
# Response: {"error": "Invalid task ID format: '12'. Task ID must be a valid UUID.", "task_id": "12"}

# Valid UUID works normally
curl -X GET http://localhost:8181/api/tasks/550e8400-e29b-41d4-a716-446655440000

Checklist

[x] My code follows the service architecture patterns
[x] If using an AI coding assistant, I used the CLAUDE.md rules
[x] I have added tests that prove my fix/feature works
[x] All new and existing tests pass locally
[x] My changes generate no new warnings
[x] I have updated relevant documentation (inline docstrings and comments)
[x] I have verified no regressions in existing features

Breaking Changes

None - This PR only adds validation. All valid UUID operations work exactly as before.

Additional Notes

🐛 Original Bug

When attempting to ingest documentation (e.g., QNAP docs URL), ingestion would fail silently:

ERROR | Error updating task: {'message': 'invalid input syntax for type uuid: "12"', 'code': '22P02'...}
ERROR | Error updating task: {'message': 'invalid input syntax for type uuid: "322"', 'code': '22P02'...}
ERROR | Error updating task: {'message': 'invalid input syntax for type uuid: "61"', 'code': '22P02'...}

Errors repeated continuously, ingestion failed, and the operation disappeared from UI without any error message to users.

🔍 Root Cause

No UUID validation at API boundaries - endpoints accepted any string value
No validation in service layer before database operations
PostgreSQL validation happened too late for proper error handling
Exceptions caught but not propagated to progress tracker or UI

✅ Solution Details

Before (❌):

Invalid UUID "12" passed to API
Service layer doesn't validate
Reaches PostgreSQL
Database rejects: "invalid input syntax for type uuid: '12'"
Error logged but not surfaced to UI
Operation disappears silently

After (✅):

Invalid UUID "12" passed to API
API validates and rejects immediately
HTTP 400 returned with clear message
Service/database never called
User sees error (when combined with Part 2)

📝 Files Changed

New Files (4):

python/src/server/utils/validation.py - Reusable UUID validation utility
python/tests/server/utils/test_validation.py - 21 unit tests
python/tests/server/api_routes/test_task_uuid_validation.py - 17 integration tests
python/tests/server/services/test_task_service_uuid_validation.py - 18 service tests

Modified Files (2):

python/src/server/services/projects/task_service.py - Added UUID validation in update_task()
python/src/server/api_routes/projects_api.py - Added UUID validation to 4 task endpoints

Total: 978 insertions, 6 deletions

🎯 Endpoints Updated

All endpoints now validate task_id and return HTTP 400 for invalid UUIDs:

GET /api/tasks/{task_id} - Get task by ID
PUT /api/tasks/{task_id} - Update task
DELETE /api/tasks/{task_id} - Archive task
PUT /api/mcp/tasks/{task_id}/status - Update task status via MCP

📊 Quality Metrics

✅ No Linter Errors - All files pass Ruff checks
✅ 100% Test Pass Rate - 56/56 tests passing
✅ Performance - Validation < 1ms per call (100 validations in < 1 second)
✅ No Regression - All valid UUID operations work correctly
✅ Reusable - Validation utility can be used throughout codebase

🔗 Related Work

This is Part 1 of 2 for fixing knowledge ingestion failures:

Part 1 (this PR): UUID validation errors at API/service boundaries
Part 2 (separate PR): Timeout error propagation in crawling_service.py

Both PRs reference the same GitHub issue but address independent problems with separate solutions.

🚀 Impact

User Experience:

Clear error messages instead of silent failures
Faster feedback (errors caught at API boundary)
Better debugging information

Code Quality:

Validation at appropriate layers (API → Service → Database)
Reusable, well-tested validation utility
Comprehensive test coverage for edge cases
Cleaner code (removed expensive traceback logging)

Reliability:

Prevents PostgreSQL UUID validation errors
Fails fast with clear errors
No silent failures

📖 Dependencies

No new dependencies - Uses Python's built-in uuid module
No database migrations - Only code changes
No configuration changes - Works with existing setup

Summary by CodeRabbit

Release Notes

New Features
- Enhanced sitemap parsing to automatically convert relative URLs to absolute URLs for improved crawl coverage.
- Improved API error responses with clearer, more specific error messages and appropriate HTTP status codes.
Bug Fixes
- Strengthened error handling to better differentiate between invalid inputs and not-found scenarios.
Tests
- Added comprehensive test coverage for input validation and sitemap URL handling.

Oct 28 '25 22:10 thiagomaf

Walkthrough

This PR adds comprehensive UUID validation across API and service layers with a new validation utility module, enhances error handling and logging in multiple endpoints, and implements URL normalization for relative URLs in sitemap parsing. Changes span validation logic, error mapping, and extensive test coverage across integration and unit tests.

Changes

Cohort / File(s)	Summary
UUID Validation Utility `python/src/server/utils/validation.py`	New module with `is_valid_uuid()` and `validate_uuid_or_raise()` functions for UUID format validation with exception handling.
API Boundary UUID Validation `python/src/server/api_routes/projects_api.py`	Added UUID input validation at GET/PUT/DELETE task endpoints and MCP status update endpoint. Distinguishes between not-found (404), invalid UUID (400), and other errors; preserves error messages and adds stack trace logging.
Service-Layer UUID Validation `python/src/server/services/projects/task_service.py`	Added UUID format validation in `update_task()` with enhanced error logging including task_id, error details, and updated fields.
Sitemap URL Normalization `python/src/server/services/crawling/strategies/sitemap.py`	Implements URL normalization in `parse_sitemap()`: composes relative URLs into absolute via `urljoin`, validates scheme/netloc, logs mappings, and includes per-URL error isolation. Updated docstrings to reflect absolute URL output.
API UUID Validation Tests `python/tests/server/api_routes/test_task_uuid_validation.py`	Comprehensive integration test suite covering GET, PUT, DELETE, and MCP endpoints with valid/invalid UUID scenarios, error message validation, and cross-endpoint consistency checks.
Service UUID Validation Tests `python/tests/server/services/test_task_service_uuid_validation.py`	Unit tests for TaskService UUID validation, verifying rejection of invalid formats, prevention of database calls for invalid inputs, and correct behavior with valid UUIDs.
Validation Utility Tests `python/tests/server/utils/test_validation.py`	Unit tests for `is_valid_uuid()` and `validate_uuid_or_raise()` covering valid/invalid UUIDs, edge cases (None, whitespace, different UUID versions), and error message structure.
Sitemap URL Tests `python/tests/test_sitemap_relative_urls.py`	Comprehensive test suite for sitemap URL normalization including absolute/relative URL handling, subdirectory resolution, whitespace trimming, HTTP errors, network errors, and real-world example validation.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as API Endpoint
    participant Validation as UUID Validator
    participant Service as TaskService
    participant DB as Database

    Client->>API: GET /tasks/{task_id}
    API->>Validation: is_valid_uuid(task_id)
    alt Invalid UUID
        Validation-->>API: False
        API-->>Client: HTTP 400 (Invalid UUID)
    else Valid UUID
        Validation-->>API: True
        API->>Service: get_task(task_id)
        Service->>Validation: is_valid_uuid(task_id)
        Validation-->>Service: True
        Service->>DB: Query task
        alt Task found
            DB-->>Service: Task data
            Service-->>API: Task result
            API-->>Client: HTTP 200 (Task)
        else Task not found
            DB-->>Service: Not found
            Service-->>API: Error (not found)
            API-->>Client: HTTP 404
        end
    end

sequenceDiagram
    participant Crawler
    participant SitemapFetcher as Sitemap Fetcher
    participant Parser as URL Parser
    participant Normalizer as URL Normalizer
    participant Validator as URL Validator

    Crawler->>SitemapFetcher: fetch_sitemap(sitemap_url)
    SitemapFetcher->>Parser: parse XML
    Parser->>Parser: extract `<loc>` elements
    loop For each URL
        Parser->>Normalizer: normalize(url_text)
        Normalizer->>Normalizer: trim whitespace
        alt Is absolute URL
            Normalizer->>Validator: validate(absolute_url)
        else Is relative URL
            Normalizer->>Normalizer: urljoin(base, relative)
            Normalizer->>Validator: validate(composed_url)
        end
        alt Valid (http/https + netloc)
            Validator-->>Parser: add to results
        else Invalid
            Validator-->>Parser: skip + log warning
        end
    end
    Parser-->>SitemapFetcher: list of absolute URLs
    SitemapFetcher-->>Crawler: normalized URLs

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

UUID validation consistency: Verify alignment between API boundary checks and service-layer validation to prevent duplicate or conflicting error handling.
Error message mapping logic: Ensure proper differentiation between not-found (404) and invalid UUID (400) across multiple endpoints.
Sitemap URL normalization: Review urljoin() behavior with edge cases (subdirectories, parent-relative paths) and validation logic for scheme/netloc.
Test coverage: Confirm test fixtures, mocks, and assertions properly isolate functionality and cover both happy paths and error scenarios.
Logging additions: Verify exc_info=True usage does not introduce performance overhead and stack traces are only captured when needed.

Suggested reviewers

coleam00
leex279
tazmon95

Poem

🐰 With UUIDs we validate, At boundaries, we don't hesitate, Relative URLs now take flight, Normalized to absolute right, Tests ensure all corners are tight! 🎯

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "Fix/UUID validation task endpoints" clearly and specifically describes the main change in this pull request. It references both the core solution (UUID validation) and the target scope (task endpoints), which directly aligns with the changeset's primary purpose of adding UUID validation at API and service boundaries. The title is concise, avoids vague terminology, and provides meaningful information that would help a developer scanning PR history understand the change at a glance.
Description Check	✅ Passed	The pull request description comprehensively follows the repository template, including all major sections with thorough content: a detailed summary explaining the bug and solution, well-organized changes made with clear bullet points, proper type of change selections, affected services marked, comprehensive testing information with specific test commands and evidence, completed checklists with all items addressed, breaking changes section (explicitly stating "None"), and extensive additional notes covering root cause analysis, solution details, file changes, endpoints updated, quality metrics, and dependencies. The description exceeds the template requirements by providing exceptional context and detail that enables thorough review.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Oct 28 '25 22:10 coderabbitai[bot]

This seems like a really solid contribution, thank you! adding to my list of prs to review

Nov 24 '25 08:11 Wirasm

Fix/UUID validation task endpoints

Pull Request

Summary

Changes Made

Type of Change

Affected Services

Testing

Test Evidence

Checklist

Breaking Changes

Additional Notes

🐛 Original Bug

🔍 Root Cause

✅ Solution Details

📝 Files Changed

🎯 Endpoints Updated

📊 Quality Metrics

🔗 Related Work

🚀 Impact

📖 Dependencies

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches