starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Feature] URL Function Implementation

Open EdwardArchive opened this issue 3 weeks ago • 28 comments

Issue: #66227
Related PR: #66206
Status: Developing on SSRF (Named Parameter done)
Authors: @EdwardArchive, with review from @alvin-celerdata

What I'm doing:

Executive Summary

This PR adds an http_request() scalar function to StarRocks that enables executing HTTP/HTTPS requests directly from SQL queries. The function allows users to send webhook notifications, integrate with external REST APIs for data enrichment, and enable real-time event-driven workflows from within SQL.

Key Design Decisions:

Decision Outcome
Function Name http_request (clearer than url)
Parameter Style Named arguments (requires enhancement for non-table functions)
Security Model Allowlist-based with private network blocking by default
Future Enhancement CONNECTION object for credential management and reusability

Background & Motivation

Modern data platforms increasingly require integration with external services for alerting, enrichment, and workflow automation. Currently, StarRocks users must implement these integrations outside the database layer, adding complexity and latency to their data pipelines.

By providing native HTTP request capability within SQL, StarRocks can:

  • Reduce architectural complexity by eliminating external integration layers
  • Enable real-time responses to data events without leaving the SQL context
  • Simplify webhook integrations for alerting and notification systems
  • Support data enrichment workflows that require external API calls

Use Cases

1. Alerting & Notifications

Send Slack/Discord notifications when anomaly detection queries identify issues:

SELECT http_request(
    url => 'https://hooks.slack.com/services/T00/B00/XXX',
    method => 'POST',
    headers => '{"Content-Type": "application/json"}',
    body => CONCAT('{"text": "Alert: ', anomaly_description, '"}')
)
FROM anomaly_detection_results
WHERE severity = 'CRITICAL';

2. Data Enrichment

Call external APIs to augment query results with additional context:

SELECT 
    customer_id,
    order_total,
    JSON_QUERY(
        http_request(
            url => CONCAT('https://api.enrichment.com/customer/', customer_id)
        ),
        '$.credit_score'
    ) AS credit_score
FROM orders;

3. Webhook Integration

Trigger external workflows based on data changes:

-- Trigger inventory replenishment workflow
SELECT http_request(
    url => 'https://inventory.internal/api/reorder',
    method => 'POST',
    body => JSON_OBJECT('sku', sku, 'quantity', reorder_qty)
)
FROM inventory
WHERE current_stock < reorder_threshold;

Function Specification

Function Name

Decision: http_request

The name http_request was chosen over url for the following reasons:

  1. Clarity: Explicitly indicates the function performs HTTP requests
  2. Consistency: Aligns with naming conventions in other systems (e.g., http_get, http_post)
  3. Future Compatibility: Avoids naming conflicts if url is needed for URL parsing/manipulation functions

Function Signature

http_request(
    url VARCHAR,
    method VARCHAR DEFAULT 'GET',
    body VARCHAR DEFAULT '',
    headers VARCHAR DEFAULT '{}',
    timeout_ms INT DEFAULT 30000,
    ssl_verify BOOLEAN DEFAULT true,
    username VARCHAR DEFAULT '',
    password VARCHAR DEFAULT ''
) -> VARCHAR

Note: This function uses Named Arguments support for scalar functions, implemented as part of this feature.

Parameters

Parameter Type Default Description
url VARCHAR required Target URL for the HTTP request
method VARCHAR 'GET' HTTP method: GET, POST, PUT, DELETE, PATCH
body VARCHAR '' Request body (for POST/PUT/PATCH)
headers VARCHAR '{}' JSON object containing HTTP headers
timeout_ms INT 30000 Request timeout in milliseconds
ssl_verify BOOLEAN true Enable/disable SSL certificate verification
username VARCHAR '' Username for HTTP Basic Authentication
password VARCHAR '' Password for HTTP Basic Authentication

Return Value

Returns VARCHAR containing a JSON object with response information:

{
    "status": 200,
    "body": "{\"result\": \"success\"}"
}
  • status: HTTP response status code
  • body: Response body as string

Security Design

SSRF Risk Analysis

The http_request function introduces potential Server-Side Request Forgery (SSRF) risks:

  1. Internal Network Access: Attackers could probe internal services
  2. Port Scanning: Using the function to scan internal network ports
  3. Data Exfiltration: Sending sensitive data to attacker-controlled endpoints

Industry Comparison

Database Security Approach Complexity
ClickHouse remote_url_allow_hosts configuration Low
Snowflake Network Rules + External Access Integration High
Databricks Network Policies with Allowed Domains (FQDN) + CONNECTION objects High
DuckDB enable_external_access variable Low
PostgreSQL (pgsql-http) Function-level permission control Medium

Proposed Security Controls

The security implementation follows a phased approach balancing immediate protection with future extensibility:

┌─────────────────────────────────────────────────────────────────┐
│                     Security Architecture                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1 (Current)              Phase 2 (Future)                │
│  ┌─────────────────┐           ┌─────────────────┐              │
│  │ Configuration-  │           │   CONNECTION    │              │
│  │ Based Controls  │    ──►    │     Objects     │              │
│  └────────┬────────┘           └────────┬────────┘              │
│           │                             │                        │
│           ▼                             ▼                        │
│  ┌─────────────────┐           ┌─────────────────┐              │
│  │ • Allowlist     │           │ • Encrypted     │              │
│  │ • Regex Match   │           │   Credentials   │              │
│  │ • Private Net   │           │ • Reusable      │              │
│  │   Blocking      │           │   Endpoints     │              │
│  │ • SSL Required  │           │ • Single SQL    │              │
│  └─────────────────┘           │   Management    │              │
│                                └─────────────────┘              │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Phase 1: Configuration-Based Security

Security Parameters

Configuration Type Default Description
http_request_host_allowlist STRING "" Comma-separated list of allowed hosts
http_request_host_allowlist_regexp STRING "" Comma-separated list of allowed host regex patterns
http_request_block_private_networks INT 1~4 Block private IP ranges and localhost
http_request_ssl_verification_required BOOL true Enforce HTTPS with certificate validation

Private Network Blocking

When http_request_block_private_networks = true, the following ranges are blocked:

10.0.0.0/8        (Class A private)
172.16.0.0/12     (Class B private)
192.168.0.0/16    (Class C private)
127.0.0.0/8       (Loopback)
169.254.0.0/16    (Link-local, includes cloud metadata)
::1/128           (IPv6 loopback)
fc00::/7          (IPv6 private)
fe80::/10         (IPv6 link-local)

Default Behavior (Secure by Default)

Critical: When both http_request_host_allowlist and http_request_host_allowlist_regexp are empty strings (default), no HTTP requests are allowed.

This ensures:

  1. Users must explicitly configure allowed endpoints
  2. Accidental exposure is prevented
  3. Security-first deployment model

Configuration Management

Configurations are managed as FE Dynamic Config via ADMIN SET FRONTEND CONFIG:

-- Enable specific hosts
ADMIN SET FRONTEND CONFIG (
    "http_request_host_allowlist" = "api.slack.com,hooks.slack.com,api.example.com"
);

-- Enable hosts matching pattern
ADMIN SET FRONTEND CONFIG (
    "http_request_host_allowlist_regexp" = ".*\\.internal\\.company\\.com"
);

-- Disable private network blocking (not recommended)
ADMIN SET FRONTEND CONFIG (
    "http_request_block_private_networks" = "false"
);

-- Require SSL verification (default: true)
ADMIN SET FRONTEND CONFIG (
    "http_request_ssl_verification_required" = "true"
);

Configuration Reference

Request Limits (Future Consideration)

Configuration Type Default Description
http_request_max_response_size INT 1048576 Maximum response size in bytes (1MB)
http_request_default_timeout_ms INT 30000 Default timeout in milliseconds

Examples

Basic GET Request

-- Simple GET (requires host in allowlist)
SELECT http_request(url => 'https://api.example.com/status');

POST with JSON Body

SELECT http_request(
    url => 'https://hooks.slack.com/services/XXX',
    method => 'POST',
    headers => '{"Content-Type": "application/json"}',
    body => '{"text": "Database alert triggered"}'
);

Dynamic Webhook from Query Results

SELECT 
    metric_name,
    metric_value,
    http_request(
        url => 'https://monitoring.internal/api/alert',
        method => 'POST',
        headers => '{"Content-Type": "application/json", "X-API-Key": "xxx"}',
        body => JSON_OBJECT(
            'metric', metric_name,
            'value', metric_value,
            'threshold', threshold,
            'timestamp', NOW()
        )
    ) AS alert_response
FROM metrics
WHERE metric_value > threshold;

Using with CONNECTION (Phase 2)

-- Create connection
CREATE CONNECTION pagerduty (
    type = 'HTTP',
    url = 'https://events.pagerduty.com/v2/enqueue',
    method = 'POST',
    headers = '{"Content-Type": "application/json"}',
    password = 'integration_key_xxx'
);

-- Use connection
SELECT http_request(
    connection => 'pagerduty',
    body => JSON_OBJECT(
        'routing_key', GET_CONNECTION_SECRET('pagerduty', 'password'),
        'event_action', 'trigger',
        'payload', JSON_OBJECT(
            'summary', error_message,
            'severity', 'critical',
            'source', 'starrocks'
        )
    )
)
FROM error_logs
WHERE severity = 'CRITICAL' AND created_at > NOW() - INTERVAL 5 MINUTE;

Named Arguments for Scalar Functions

Overview

StarRocks now supports Named Arguments for scalar functions, enabling more readable and flexible function calls. This feature was implemented as a prerequisite for http_request() which has 8 parameters with defaults.

Syntax

-- Named Arguments syntax
function_name(param1 => value1, param2 => value2, ...)

-- Example
SELECT http_request(url => 'https://api.example.com', method => 'POST');

Features

Feature Description Example
Named Parameters Specify arguments by name http_request(url => '...', method => 'GET')
Default Values Omit optional parameters http_request(url => '...') — uses default method='GET'
Positional Arguments Traditional positional call still works http_request('https://...')
Mixed Mode Not supported (Named-only or Positional-only) -

User-Friendly Error Messages

Scenario Error Message
Missing required parameter http_request() required parameter 'url' is missing
Unknown parameter (with hint) http_request() unknown parameter 'URL'. Did you mean 'url'?
Duplicate parameter http_request() duplicate parameter 'url'
NULL for required parameter http_request() required parameter 'url' cannot be NULL
Empty arguments http_request() requires at least 1 argument(s). Missing required parameter(s): 'url'

Implementation Architecture

┌─────────────────────────────────────────────────────────────────┐
│                   Named Arguments Flow                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. Grammar (StarRocks.g4)                                      │
│     └─ Parse `param => value` syntax                            │
│                                                                  │
│  2. AST Builder (AstBuilder.java)                               │
│     └─ Create FunctionParams with named arguments               │
│                                                                  │
│  3. Function Registry (functions.py + gen_functions.py)         │
│     └─ Define function metadata with named_args:                │
│        {'name': 'url'},                                         │
│        {'name': 'method', 'default': 'GET'},                    │
│                                                                  │
│  4. Code Generation (VectorizedBuiltinFunctions.java)           │
│     └─ setArgNames(), setDefaultNamedArgs()                     │
│                                                                  │
│  5. Analyzer (ExpressionAnalyzer.java, FunctionAnalyzer.java)   │
│     └─ Validate, reorder, and fill defaults                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Defining a Named Arguments Function

In gensrc/script/functions.py:

[30470, 'http_request', True, False, 'VARCHAR',
 ['VARCHAR', 'VARCHAR', 'VARCHAR', 'VARCHAR', 'INT', 'BOOLEAN', 'VARCHAR', 'VARCHAR'],
 'HttpRequestFunctions::http_request',
 'HttpRequestFunctions::http_request_prepare', 'HttpRequestFunctions::http_request_close',
 {
     'named_args': [
         {'name': 'url'},                          # Required (no default)
         {'name': 'method', 'default': 'GET'},     # Optional with default
         {'name': 'body', 'default': ''},
         {'name': 'headers', 'default': '{}'},
         {'name': 'timeout_ms', 'default': 30000},
         {'name': 'ssl_verify', 'default': True},
         {'name': 'username', 'default': ''},
         {'name': 'password', 'default': ''}
     ]
 }],

Key Files Modified

File Purpose
fe/fe-grammar/.../StarRocks.g4 Grammar rule for => syntax
fe/fe-core/.../AstBuilder.java Parse named arguments to AST
fe/fe-core/.../FunctionParams.java Store and reorder named arguments
fe/fe-core/.../Function.java Store arg names and defaults
fe/fe-core/.../FunctionAnalyzer.java Validate named arguments
fe/fe-core/.../ExpressionAnalyzer.java Resolve function with defaults
gensrc/script/functions.py Function definitions with named_args
gensrc/script/gen_functions.py Generate Java registration code

Phase 2: CONNECTION Object (Summary)

Status: Future Enhancement

Phase 2 introduces CONNECTION objects for centralized credential management and reusable endpoint definitions.

Quick Reference

-- Create a reusable connection
CREATE CONNECTION slack_webhook (
    type = 'HTTP',
    url = 'https://hooks.slack.com/services/XXX',
    method = 'POST',
    headers = '{"Content-Type": "application/json"}'
);

-- Use the connection
SELECT http_request(connection => 'slack_webhook', body => '{"text": "Hello"}');

Key Benefits

  • Centralized Credentials: Passwords stored securely with encryption
  • Reusability: Define once, use across multiple queries
  • Audit Trail: Track connection usage and modifications

Backward Compatibility

Compatibility Strategy

The implementation supports Option B: Support Both approaches.

Migration Path

Phase 1                    Phase 2                    Phase 3
┌─────────────┐           ┌─────────────┐           ┌─────────────┐
│ URL-based   │           │ Both URL &  │           │ CONNECTION- │
│ with Config │    ──►    │ CONNECTION  │    ──►    │ preferred   │
│ Controls    │           │ supported   │           │             │
└─────────────┘           └─────────────┘           └─────────────┘
     │                          │                         │
     │                          │                         │
     ▼                          ▼                         ▼
  Allowlist               Add CONNECTION            Deprecation
  validation              object support            warnings for
  only                                              direct URL

References

External Documentation

System Documentation
ClickHouse url() Table Function
PostgreSQL pgsql-http Extension
Snowflake External Functions
Databricks CREATE CONNECTION

Appendix: Discussion Summary

Key Discussion Points with @alvin-celerdata

  1. Function Naming: Agreed on http_request over url for clarity

  2. Named Arguments: Identified need to enhance non-table functions to support named arguments

  3. SSRF Protection:

    • Initial proposal: URL allowlist + private network blocking
    • Alvin's suggestion: Consider CONNECTION objects for stricter control
    • Resolution: Implement both approaches (Option B)
  4. Configuration Management:

    • Initial: Static fe.conf
    • Improved: FE Dynamic Config via ADMIN SET FRONTEND CONFIG
    • Future: Single SQL statement via CONNECTION objects
  5. Backward Compatibility: Agreed on supporting both URL argument and CONNECTION argument with configuration toggle


Last Updated: Based on GitHub Issue #66227 discussion


[!NOTE] Introduces http_request() with named parameters, SSRF safeguards (allowlists, security levels, DNS pinning), and adds named-argument support across parser/analyzer, configs, networking utils, HttpClient, and tests.

  • Functionality:
    • New scalar function http_request() (nondeterministic) with named parameters, JSON responses, size limit, timeout, SSL options, headers/body/auth.
    • Security/SSRF: security levels, host/IP allowlists (regex/IP), private/link-local IP blocking, admin-enforced SSL verification, and DNS pinning.
  • Networking/HTTP:
    • HttpClient: disable redirects, add set_resolve_host() (CURLOPT_RESOLVE) and set_fail_on_error(), cleanup resolve list.
    • network_util: add is_private_ip, is_link_local_ip, resolve_hostname_all_ips, extract_host_from_url, extract_port_from_url.
  • Config/Runtime/Thrift:
    • FE Config and SessionVariable expose http_request_* settings; Thrift TQueryOptions fields; BE RuntimeState getters.
  • Parser/Analyzer/Registry:
    • Add named-arguments support for scalar functions: grammar (param => value), AST handling, validation (missing/duplicate/unknown/NULL), reordering and default filling.
    • Function registry/codegen updated to register named args/defaults for http_request.
  • BE Integration:
    • Register function (FID 30470), mark as returning random in function_call_expr, include in build.
  • Tests/Docs:
    • Extensive unit and SQL tests for http_request, security, network utils, HttpClient, and named-arg handling.
    • Add documentation for HTTP URL function usage.

Written by Cursor Bugbot for commit b1a2faf873385f3a5aa1497c9eb376fcdc85afb9. This will update automatically on new commits. Configure here.

What type of PR is this:

  • [ ] BugFix
  • [x] Feature
  • [ ] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [x] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [x] This pr needs user documentation (for new or modified features or behaviors)
    • [x] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [ ] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 4.0
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3

EdwardArchive avatar Dec 02 '25 08:12 EdwardArchive

What about the security considerations? Should we implement limitations on the domains?

@copilot how do you think about it

murphyatwork avatar Dec 02 '25 10:12 murphyatwork

What about the security considerations? Should we implement limitations on the domains?

@copilot how do you think about it

Hi @murphyatwork

Thank you very much for your great feedback.

In my view, I also considered this approach, but concluded that relying on infrastructure-level defenses (Security Groups, Network Firewall, etc.) from a StarRocks perspective would reduce management complexity.

For now, I plan to add a variable that allows ADMIN accounts to enforce SSL certificate validation, preventing other users from calling URLs without SSL certificates.

For reference, when considering the cases of other databases: PostgreSQL and DuckDB approach:

PostgreSQL provides this only as function-level controls DuckDB allows control through configuration variables + external access denied variables

Enterprise-grade databases (Databricks, Snowflake, ClickHouse):

These maintain their own Network Rule Lists internally This appears to be because SaaS providers have control over both the application AND the infrastructure, allowing them to leverage this unified control

Thanks!

EdwardArchive avatar Dec 02 '25 11:12 EdwardArchive

I just fix almost case generated from copilot.

gensrc/thrift/InternalService.thrift:

  • Added url_ssl_verification_required option (field 201) to TQueryOptions

fe/fe-core/src/main/java/com/starrocks/qe/SessionVariable.java:

  • Added Config.url_ssl_verification_required import
  • Added tResult.setUrl_ssl_verification_required() in toThrift() method
  • be/src/runtime/runtime_state.h:
  • Added url_ssl_verification_required() accessor method

be/src/exprs/url_functions.cpp:

  • Added #include "runtime/runtime_state.h"
  • Fixed simdjson error checks: !obj[...].get() → obj[...].get() == simdjson::SUCCESS
  • Added HttpClient reuse across rows for better performance
  • Added constant config caching with std::optional<UrlConfig>
  • Moved ColumnViewer creation outside loop
  • Added DELETE method body support
  • Changed url_prepare() to read SSL config from RuntimeState
  • Changed invalid config to return JSON error instead of NULL

be/src/exprs/url_functions.cpp:

  • Added is_valid_utf8() - Validates UTF-8 byte sequences, returns false for invalid encoding
  • Added is_valid_json() - Uses simdjson for proper JSON validation instead of simple string check
  • Modified build_json_response() - Returns JSON error response when body contains invalid UTF-8

be/test/exprs/url_functions_test.cpp:

  • Fixed prepareCloseTest - Only checks ssl_verify_required field (removed deprecated fields)

docs/en/sql-reference/sql-functions/scalar-functions/url.md:

  • Updated documentation with JSON config parameters and examples

EdwardArchive avatar Dec 02 '25 15:12 EdwardArchive

@cursor review

alvin-celerdata avatar Dec 02 '25 17:12 alvin-celerdata

Thanks

  • Fix response size check to use streaming callback (using streaming it will return "Response size exceeds limit (1048576 bytes). Received: 5131466 bytes")
  • Fix timeout_ms integer overflow with bounds checking ( larger then 300,000 it'll be 300,000)
  • update url.md file

EdwardArchive avatar Dec 02 '25 17:12 EdwardArchive

@EdwardArchive Thanks for the contribution, after roughly investigating other systems' implementation, I have some suggestions. 2. please create a GitHub Issue for this new function, and in that place we need to get aligned on the interface of this function. 3. Because this function will access an external HTTP link, it may introduce potential risks to the system. Maybe we need to introduce some concepts like connection in Databricks. 4. please explain the background for this function in the issue ticket.

alvin-celerdata avatar Dec 02 '25 18:12 alvin-celerdata

@EdwardArchive Thanks for the contribution, after roughly investigating other systems' implementation, I have some suggestions. 2. please create a GitHub Issue for this new function, and in that place we need to get aligned on the interface of this function. 3. Because this function will access an external HTTP link, it may introduce potential risks to the system. Maybe we need to introduce some concepts like connection in Databricks. 4. please explain the background for this function in the issue ticket.

Thanks! I just make issue on here https://github.com/StarRocks/starrocks/issues/66227

EdwardArchive avatar Dec 02 '25 19:12 EdwardArchive

@cursor review

alvin-celerdata avatar Dec 03 '25 04:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 03 '25 17:12 alvin-celerdata

this will changed so delete old request

EdwardArchive avatar Dec 03 '25 18:12 EdwardArchive

Skipping Bugbot: Unable to authenticate your request. Please make sure Bugbot is properly installed and configured for this repository.

cursor[bot] avatar Dec 03 '25 18:12 cursor[bot]

@cursor review

alvin-celerdata avatar Dec 04 '25 03:12 alvin-celerdata

Named Arguments Support for Scalar Functions

Summary

This PR implements Named Arguments support for scalar functions in StarRocks, using http_request() as the first function to leverage this feature.

Features

  • Named Arguments syntax: function(param => value, ...)
  • Default values: Optional parameters can be omitted
  • Positional calls: Traditional positional syntax still works
  • User-friendly error messages: Clear hints for common mistakes

Usage Examples

-- Named Arguments (any order, omit optional params)
SELECT http_request(url => 'https://api.example.com');
SELECT http_request(url => 'https://api.example.com', method => 'POST', body => '{}');
SELECT http_request(method => 'POST', url => 'https://api.example.com');  -- order doesn't matter

-- Positional Arguments (still supported)
SELECT http_request('https://api.example.com');

Error Messages

Scenario Error Message
Missing required param http_request() required parameter 'url' is missing
Unknown param (with hint) http_request() unknown parameter 'URL'. Did you mean 'url'?
Duplicate param http_request() duplicate parameter 'url'
NULL for required param http_request() required parameter 'url' cannot be NULL
No arguments http_request() requires at least 1 argument(s). Missing required parameter(s): 'url'

Implementation

How to Define a Named Arguments Function

In gensrc/script/functions.py:

[30470, 'http_request', True, False, 'VARCHAR',
 ['VARCHAR', 'VARCHAR', 'VARCHAR', 'VARCHAR', 'INT', 'BOOLEAN', 'VARCHAR', 'VARCHAR'],
 'HttpRequestFunctions::http_request',
 'HttpRequestFunctions::http_request_prepare', 'HttpRequestFunctions::http_request_close',
 {
     'named_args': [
         {'name': 'url'},                          # Required (no default)
         {'name': 'method', 'default': 'GET'},     # Optional with default
         {'name': 'body', 'default': ''},
         {'name': 'headers', 'default': '{}'},
         {'name': 'timeout_ms', 'default': 30000},
         {'name': 'ssl_verify', 'default': True},
         {'name': 'username', 'default': ''},
         {'name': 'password', 'default': ''}
     ]
 }],

Then run code generation:

cd gensrc/script && python3 gen_functions.py --java /path/to/fe/target/generated-sources/build

Files Changed

File Changes
gensrc/script/functions.py Added named_args metadata for http_request
gensrc/script/gen_functions.py Added template for Named Arguments code generation
fe/.../FunctionParams.java Added reorderNamedArgAndAppendDefaults(), appendDefaultsForPositionalArgs()
fe/.../FunctionAnalyzer.java Added Named Arguments validation with user-friendly errors
fe/.../ExpressionAnalyzer.java Added branching logic for Named vs Positional arguments

Architecture Flow

SQL: http_request(url => '...', method => 'POST')
         │
         ▼
┌─────────────────────────────────────┐
│ 1. Parser (StarRocks.g4)            │
│    Parse `param => value` syntax    │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 2. AstBuilder.java                  │
│    Create NamedArgument AST nodes   │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 3. FunctionParams.java              │
│    Separate exprs[] + exprsNames[]  │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 4. ExpressionAnalyzer.java          │
│    Branch: Named vs Positional      │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 5. FunctionAnalyzer.java            │
│    Validate & lookup function       │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 6. FunctionParams.java              │
│    Reorder args & append defaults   │
└─────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────┐
│ 7. Backend (C++)                    │
│    Receive all 8 columns            │
└─────────────────────────────────────┘

Backward Compatibility

  • No impact on existing functions: Only functions with named_args defined use this path
  • Positional calls still work: http_request('url') works as expected
  • Varargs functions excluded: Functions like concat() are not affected

Constraints

  • No mixed mode: All arguments must be either named or positional (not both)
  • Case-sensitive: Parameter names are case-sensitive (urlURL)
  • Varargs not supported: Functions with variable arguments cannot use Named Arguments

EdwardArchive avatar Dec 04 '25 17:12 EdwardArchive

🧪 CI Insights

Here's what we observed from your CI run for 8cc37615.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Dec 04 '25 18:12 mergify[bot]

@EdwardArchive

This PR implements Named Arguments support for scalar functions in StarRocks, using http_request() as the first function to leverage this feature.

This is great, but I would like you to separate it into two PRs. Please implement Named Arguments in a separate PR.

alvin-celerdata avatar Dec 04 '25 18:12 alvin-celerdata

@alvin-celerdata Thanks, then I'll try to make separate it into two PRs.

EdwardArchive avatar Dec 04 '25 18:12 EdwardArchive

@cursor review

alvin-celerdata avatar Dec 05 '25 17:12 alvin-celerdata

There are changes and more detail on SSRF feature

SSRF Protection Implementation Summary

Overview

This PR implements comprehensive Server-Side Request Forgery (SSRF) protection for the http_request() function with a 4-level security system and defense-in-depth architecture.

Security Levels

Level Mode Behavior
1 TRUSTED Allow all requests (development only)
2 PUBLIC Block private IPs, allow public hosts
3 RESTRICTED Default - Require allowlist for all hosts
4 PARANOID Block all requests unconditionally

Default: Level 3 (RESTRICTED) - Secure by default, requires explicit allowlist configuration.

Configuration Parameters

All parameters are runtime-mutable via ADMIN SET FRONTEND CONFIG:

-- Security level (1=TRUSTED, 2=PUBLIC, 3=RESTRICTED, 4=PARANOID)
ADMIN SET FRONTEND CONFIG ("http_request_security_level" = "3");

-- IP allowlist (exact match on IPv4)
ADMIN SET FRONTEND CONFIG ("http_request_ip_allowlist" = "192.168.1.1,10.0.0.1");

-- Host regex patterns (full string match)
ADMIN SET FRONTEND CONFIG ("http_request_host_allowlist_regexp" = "api\\.slack\\.com,.*\\.example\\.com");

-- Admin-enforced SSL verification (prevents user bypass)
ADMIN SET FRONTEND CONFIG ("http_request_ssl_verification_required" = "true");

-- Allow private IPs if in allowlist (NOT recommended for production)
ADMIN SET FRONTEND CONFIG ("http_request_allow_private_in_allowlist" = "false");

IP Blocking Implementation

IPv4 Private Ranges (6 ranges)

  • 127.0.0.0/8 - Loopback
  • 10.0.0.0/8 - Class A Private
  • 172.16.0.0/12 - Class B Private
  • 192.168.0.0/16 - Class C Private
  • 169.254.0.0/16 - Link-local (Cloud Metadata)
  • 0.0.0.0/8 - Current network

IPv6 Private Ranges (4 ranges)

  • ::1/128 - IPv6 loopback
  • fc00::/7 - Unique local addresses
  • fe80::/10 - Link-local
  • ::ffff:0:0/96 - IPv4-mapped addresses

Special: Cloud Metadata Protection

Dedicated detection for 169.254.169.254 (AWS/GCP/Azure metadata endpoints) with enhanced warning:

WARNING: Allowing this IP can expose cloud credentials and sensitive metadata.

Defense-in-Depth Architecture

8 Security Layers:

  1. Protocol Validation - Only http:// and https:// allowed
  2. Security Level Check - TRUSTED/PUBLIC/RESTRICTED/PARANOID
  3. URL Parsing - Extract hostname, handle IPv6 brackets
  4. DNS Resolution - Resolve ALL IPs (IPv4 + IPv6), not just first
  5. Private IP Detection - Check all resolved IPs against private ranges
  6. Allowlist Validation - IP exact match OR regex pattern match
  7. SSL/TLS Enforcement - Admin can enforce SSL, preventing user bypass
  8. Response Size Limiting - Max 1 MB response

Key Implementation Details

DNS Resolution Security

// Resolves hostname to ALL IPs (both IPv4 and IPv6)
// Then validates EVERY resolved IP against security rules
Status resolve_hostname_all_ips(const std::string& hostname,
                                std::vector<std::string>& ips);

Why: Prevents DNS rebinding attacks where attacker controls DNS to return private IPs after initial public IP validation.

SSL Verification Enforcement

// Two-level control:
// 1. User parameter: ssl_verify=>true/false
// 2. Admin enforcement: http_request_ssl_verification_required
if (admin_requires_ssl && user_requests_no_verify) {
    return ERROR("SSL verification is enforced by administrator");
}

Allowlist Logic

// IP allowlist (exact string match) OR host regex match
bool check_allowlist(host, resolved_ips) {
    for (auto& ip : resolved_ips) {
        if (ip_allowlist.contains(ip)) return true;  // IP match
    }
    return regex_match(host, host_patterns);  // Regex match
}

Production Configuration Example

Recommended Secure Defaults:

-- 1. Use RESTRICTED mode (require allowlist)
ADMIN SET FRONTEND CONFIG ("http_request_security_level" = "3");

-- 2. Enforce SSL verification globally
ADMIN SET FRONTEND CONFIG ("http_request_ssl_verification_required" = "true");

-- 3. Configure allowed public APIs only
ADMIN SET FRONTEND CONFIG ("http_request_host_allowlist_regexp" =
    "api\\.slack\\.com,hooks\\.slack\\.com,api\\.github\\.com");

-- 4. Keep private IP allowlist disabled (default)
-- "http_request_allow_private_in_allowlist" = "false"

Test Coverage

Comprehensive test suite with 40+ test scenarios in test/sql/test_http_request_function/T/test_http_request_security.sql:

Test Categories:

  • All IPv4 private IP ranges (127.x, 10.x, 172.16-31.x, 192.168.x, 169.254.x, 0.x)
  • All IPv6 private IP ranges (::1, fc00::/7, fe80::/10, IPv4-mapped)
  • Cloud metadata detection (169.254.169.254)
  • Security level transitions (1→2→3→4)
  • Allowlist matching (IP exact + regex patterns)
  • SSL verification enforcement
  • DNS resolution security (all IPs checked)
  • SSRF bypass attempts (decimal IPs, octal notation, etc.)

Industry Comparison

Feature StarRocks ClickHouse Snowflake Databricks
Default Security RESTRICTED RESTRICTED NETWORK RULES NETWORK POLICIES
Allowlist Support IP + Regex Hosts only FQDN rules FQDN + IP
Private IP Blocking Default ON Config-based Always Default
SSL Enforcement Admin-enforced User-controlled Always Always
DNS Resolution All IPs First IP only All IPs All IPs
Cloud Metadata Protection Special detection Blocked Blocked Blocked

StarRocks Advantages:

  1. 4-level security system (vs. binary ON/OFF)
  2. Regex pattern matching (more flexible than exact domains)
  3. Admin SSL enforcement (prevents user bypass)
  4. Comprehensive IPv6 support with all private ranges
  5. Special cloud metadata detection with enhanced warnings

Code Locations

Component File Lines
FE Config Parameters fe/fe-core/src/main/java/com/starrocks/common/Config.java 4044-4071
Security Levels Enum be/src/exprs/http_request_functions.h 30-35
Main Validation Logic be/src/exprs/http_request_functions.cpp 344-457
IPv4 Private IP Check be/src/util/network_util.cpp 198-228
IPv6 Private IP Check be/src/util/network_util.cpp 230-259
DNS Resolution be/src/util/network_util.cpp 291-333
URL Parsing be/src/util/network_util.cpp 335-372
SSL Handling be/src/http/http_client.h 93-96

Security Considerations

Why Default is RESTRICTED (Level 3)?

Secure by Default Principle:

  • No requests allowed without explicit configuration
  • Forces administrators to whitelist endpoints
  • Prevents accidental SSRF exposure
  • Follows industry best practices

Why Block Private IPs by Default?

SSRF Attack Vectors:

  1. Internal service enumeration (scan internal APIs)
  2. Cloud metadata access (steal IAM credentials)
  3. Localhost bypass (access local services)
  4. Data exfiltration (send to internal logging)

Why Special Link-Local Detection?

Cloud-Specific Risk:

  • AWS: http://169.254.169.254/latest/meta-data/
  • GCP: http://metadata.google.internal/
  • Azure: http://169.254.169.254/metadata/instance

Exposure: IAM credentials, API keys, instance metadata

EdwardArchive avatar Dec 06 '25 17:12 EdwardArchive

@alvin-celerdata Hi, Is there any though about SSRF feature?

EdwardArchive avatar Dec 07 '25 13:12 EdwardArchive

@cursor review

alvin-celerdata avatar Dec 07 '25 18:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 08 '25 03:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 10 '25 17:12 alvin-celerdata

[FE Incremental Coverage Report]

:x: fail : 61 / 88 (69.32%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/common/ConfigBase.java 16 40 40.00% [256, 257, 258, 260, 261, 263, 264, 265, 267, 268, 272, 277, 278, 279, 281, 282, 284, 285, 286, 288, 289, 294, 306, 307]
:large_blue_circle: com/starrocks/sql/analyzer/ExpressionAnalyzer.java 41 44 93.18% [1107, 1108, 1110]
:large_blue_circle: com/starrocks/catalog/FunctionSet.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/analyzer/FunctionAnalyzer.java 3 3 100.00% []

github-actions[bot] avatar Dec 12 '25 03:12 github-actions[bot]

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 12 '25 03:12 github-actions[bot]

[BE Incremental Coverage Report]

:x: fail : 9 / 27 (33.33%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: src/exprs/function_call_expr.cpp 0 2 00.00% [124, 125]
:large_blue_circle: src/util/network_util.cpp 0 9 00.00% [198, 201, 202, 203, 205, 206, 209, 210, 212]
:large_blue_circle: src/http/http_client.h 0 4 00.00% [99, 100, 101, 102]
:large_blue_circle: src/runtime/runtime_state.h 2 5 40.00% [381, 390, 391]
:large_blue_circle: src/http/http_client.cpp 7 7 100.00% []

github-actions[bot] avatar Dec 12 '25 03:12 github-actions[bot]

@cursor review

alvin-celerdata avatar Dec 12 '25 04:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 12 '25 07:12 alvin-celerdata

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Dec 12 '25 21:12 CLAassistant

@cursor review

alvin-celerdata avatar Dec 12 '25 21:12 alvin-celerdata