unstract icon indicating copy to clipboard operation
unstract copied to clipboard

USEC-3 [FIX] Add URL security validation for adapter configurations

Open gaya3-zipstack opened this issue 3 months ago • 3 comments

What

Add safety checks for URLs while configuring

  • adapters and invoking test connections
  • notifications for API/ETL
  • postprocessor webhook URLs in Prompt Studio
  • URLs when using the variable replacement

Why

  • To prevent SSRF attacks

How

  • AdapterProcessor: Added validate_adapter_urls() method for URL security validation
  • AdapterInstance Views: Integrated URL validation during adapter creation
  • **

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • Maybe. Explicit checks for backward compatibility of existing adapters is a must.

Database Migrations

  • None required

Env Config

  • Optional: WHITELISTED_ENDPOINTS

Relevant Docs

Related Issues or PRs

  • https://github.com/Zipstack/unstract-sdk/pull/200
  • https://zipstack.atlassian.net/browse/USEC-3

Dependencies Versions

SDK version roll is required

Notes on Testing

  • Tested test connection by whitelisting endpoints within docker network (172.16.0.0/12)
  • Tested test connection without whitelisting docker network
  • Tested adapter creation using POST API
  • Tested adapter edit using PUT API
  • Tested configuring notifications for API
  • Tested configuring posprocessor URLs in Prompt Studio
  • Tested using URLs in variable replacement in Prompt Studio

Screenshots

Test connection working with Whiteliest image

Test connection failing image

Create adapter failing image

Update adapter failing image

API deployment/tool working image

Postprocessing URL validation working. Check the highlighted areas image

API webhook notification verification for whtelisted URLs image

Prompt Studio variable replacement fails when an unsafe URL is given image

Checklist

I have read and understood the Contribution Guidelines.

gaya3-zipstack avatar Sep 10 '25 10:09 gaya3-zipstack

Summary by CodeRabbit

  • New Features

    • Adapter URLs are validated before saving or testing.
    • Webhook and post-processing URLs are validated for safety.
    • Automatic default adapter assignment on creation.
    • Platform-provided key flow updates adapter metadata.
    • URL checks added for dynamic variable fetching.
  • Bug Fixes

    • Clearer, user-facing errors for invalid adapter URLs.
    • Safer variable replacement with proper error propagation and early exits on invalid/missing URLs.
  • Refactor

    • Centralized metadata decryption/validation and improved logging.
  • Chores

    • Sample envs include a WHITELISTED_ENDPOINTS setting for local adapters.

Walkthrough

Adds URL-only adapter validation and integrates it into adapter test/create/update flows; ViewSet centralizes decryption, metadata validation, platform-key handling, and default-assignment; URLValidator added to webhook and prompt-service dynamic/postprocessing flows; WHITELISTED_ENDPOINTS env var added.

Changes

Cohort / File(s) Summary
Adapter URL validation utility
backend/adapter_processor_v2/adapter_processor.py
Adds static validate_adapter_urls(adapter_id: str, adapter_metadata: dict) -> Adapter to obtain the adapter class via Adapterkit, instantiate it (triggering URL/config validation without a full connection), defensively copy metadata, log and re-raise errors; includes docstring and minor import/style tweaks.
View-level decryption / validation / platform-key flows
backend/adapter_processor_v2/views.py
Adds imports and private helpers (_decrypt_and_validate_metadata, _validate_adapter_urls, _check_platform_key_usage, _update_metadata_for_platform_key, _set_default_adapter_if_needed, _validate_update_metadata); decrypts adapter_metadata_b, enforces dict type, validates adapter URLs via AdapterProcessor before persisting, supports PLATFORM_PROVIDED_UNSTRACT_KEY (including paid-subscription branch), sets new adapters as user defaults, and surfaces failures as DRF ValidationError.
Notification webhook URL validation
backend/notification_v2/provider/webhook/webhook.py
Calls URLValidator.validate_url during Webhook.validate, logs validation, and raises ValueError with the validator message on invalid URLs (validation now occurs before payload presence check).
Prompt-service postprocessing URL validation
prompt-service/src/unstract/prompt_service/services/answer_prompt.py
Replaces low-level SSRF checks with URLValidator.validate_url for postprocessing webhook URLs; skips postprocessing on invalid URLs and updates metadata with processed highlights when available.
Prompt-service variable replacement defensiveness
prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py
Wraps variable detection/replacement in try/except for BadRequest, logs and publishes an ERROR-level log on failure, and re-raises to halt processing.
Dynamic variable URL validation
prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py and prompt-service/src/unstract/prompt_service/services/variable_replacement.py
Adds URL validation for dynamic variable fetching (URLValidator.validate_url), returns early or raises BadRequest for missing/invalid URLs, and ensures BadRequest propagates from replace_variables_in_prompt.
Environment configuration - backend
backend/sample.env
Adds WHITELISTED_ENDPOINTS (example 10.68.0.10) with comments for whitelisting local/managed adapter endpoints.
Environment configuration - prompt-service
prompt-service/sample.env
Adds WHITELISTED_ENDPOINTS (example 10.68.0.10) with comments for whitelisting local/managed adapter endpoints.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant View as AdapterInstanceViewSet
  participant Crypto as Fernet
  participant Proc as AdapterProcessor
  participant Kit as Adapterkit
  participant Adapter as AdapterClass
  participant DRF as DRF

  Client->>View: POST/PUT /adapters (adapter_id, adapter_metadata_b)
  View->>Crypto: decrypt(adapter_metadata_b)
  Crypto-->>View: decrypted_metadata (dict)
  View->>Proc: validate_adapter_urls(adapter_id, decrypted_metadata)
  Proc->>Kit: get_adapter_cls(adapter_id)
  Kit-->>Proc: AdapterClass
  Proc->>Adapter: instantiate(decrypted_metadata)
  Note right of Adapter: construction triggers URL/config validation (no full connection)
  Adapter-->>Proc: success / raises error
  alt validation error
    Proc--x View: exception
    View->>DRF: raise ValidationError("Error testing '<adapter_name>'. <error>.")
    View--x Client: 400 ValidationError
  else valid
    View->>View: continue create/update (platform-key handling, default assignment)
    View-->>Client: 200/201 OK
  end
sequenceDiagram
  autonumber
  participant Service as PromptService
  participant Validator as URLValidator
  participant Remote as WebhookEndpoint

  Service->>Validator: validate_url(webhook_url)
  alt invalid
    Validator-->>Service: (is_valid=false, error)
    Service->>Service: log and skip postprocessing
  else valid
    Validator-->>Service: (is_valid=true)
    Service->>Remote: POST processed payload
    Remote-->>Service: success/failure
    Service->>Service: update metadata with processed highlights (if any)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title is concise and clearly summarizes the primary change—adding URL security validation for adapter configurations—while including the issue key and [FIX] label; it directly matches the main intent and changes described in the PR. It is specific and informative enough for a reviewer scanning history to understand the primary change. Therefore the title meets the repository guidance.
Description Check ✅ Passed The PR description follows the repository template and includes What, Why, How, the "Can this PR break..." section, Database Migrations, Env Config, Related Issues/PRs, Dependencies, Notes on Testing, Screenshots, and the Checklist, with concrete testing evidence and links; this makes the description largely complete and useful for review. The How section appears truncated and should be completed, and Relevant Docs is empty or missing links, which are omissions but non-critical given the provided testing notes and related PR references. Overall the description is sufficiently complete to pass the template check.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • [ ] 📝 Generate Docstrings
🧪 Generate unit tests
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch fix/url_safety_net

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between cd129cf7f7ef343261de4bea183fc95a07c398a8 and 2f5213f5e5e15e24449c24d3d2db3cb9aef8637b.

📒 Files selected for processing (1)
  • prompt-service/sample.env (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • prompt-service/sample.env

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Sep 10 '25 10:09 coderabbitai[bot]

filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{11}}$$ $$\textcolor{#23d18b}{\tt{11}}$$

github-actions[bot] avatar Sep 18 '25 06:09 github-actions[bot]